DOCUMENT RESUME 



ED 435 655 



TM 030 231 



AUTHOR 

TITLE 

INSTITUTION 
ISBN 
PUB DATE 
NOTE 

AVAILABLE FROM 

PUB TYPE 
EDRS PRICE 
DESCRIPTORS 



Lwanga, S. K. , Ed.; Tye, Cho-Yook, Ed.; Ayeni, O.-, Ed. 
Teaching Health Statistics: Lesson and Seminar Outlines. 
Second Edition. 

World Health Organization, Geneva (Switzerland) . 

ISBN- 9- 24- 154518-6 

1999-00-00 

240p . 

World Health Organization, Marketing and Dissemination, 1211 
Geneva 27, Switzerland ($72 Swiss francs; U.S. $64.80). 
Guides - Classroom - Teacher (052) 

MF01/PC10 Plus Postage. 

Administrators; Health; ^Health Personnel; *Lesson Plans; 
Research Methodology; Statistical Analysis; ^Statistics; 
Teaching Methods 



ABSTRACT 



This book provides a selection of 23 lesson and seminar, 
outlines designed to encourage the teaching of health statistics. It 
concentrates on a core of statistical knowledge judged important for all 
categories of health trainees, including medical students. Emphasis is placed 
on statistical principles and methods that can help health personnel make 
rational decisions concerning the management of individual patients or the 
monitoring of health systems. Topics represent an internationally applicable 
basic curriculum that reflects technological developments in data handling 
and information communication. Lessons and seminars are presented in sections 
related to: (1) statistical principles and methods; (2) health statistics; 
and (3) statistics in medicine. Attachments (annexes) contain supplementary 
data sets, statistical tables, and a chart of random numbers. ( SLD) 
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Preface 



The need for a statistical approach is now well recognized in epidemiology and 
public health, since these fields are concerned with communities or populations 
where the laws of large numbers and random fluctuations clearly apply. 
Teachers of health workers and students, however, have been slow to recognize 
the need for a knowledge of statistics, even though all aspects of diagnosis and 
prognosis are affected by rules of probability. 

This book is intended to contribute to the long-term reorientation of the health 
information systems of Member States, by bringing about improved data 
generation, handling, processing and use, in order to meet future health 
requirements. 

The extent of statistical knowledge and skills that students need to acquire var- 
ies from country to country, according to such factors as the common health 
problems and methods of delivering health care in the country, and the career 
prospects of the students on graduation. Nevertheless, there is a core of statisti- 
cal knowledge that all students need to have, irrespective of their country of 
training. 

The present set of outlines is a revised version of Teaching health statistics: twenty 
lesson and seminar outlines (Lwanga SI< & Tye C-Y, eds. Geneva, World Health 
Organization, 1986). The topics covered form an internationally acceptable stand- 
ard basic curriculum for teaching health statistics. While based on those of the 
first edition, the lesson and seminar outlines have been revised and updated in 
both content and orientation. They cover not only the conventional topics of 
data collection, presentation and analysis, probability and vital statistics, but also 
such topics as health indicators, use of computers and rapid methods of interim 
assessment. The concepts highlighted by the outlines should be useful to all 
students in the health field, and are meant to be used selectively by teachers of 
statistics in preparing their courses. 

This new edition is a result of close collaboration between a number of 
eminent teachers of statistics, and has been coordinated and edited by 
Mr S. K. Lwanga, Statistician, Department of Health Systems, World Health 
Organization, with valuable assistance from Dr O. Ayeni, Biostatistician, Special 
Programme of Research, Development and Research Training in Human 
Reproduction, Department of Reproductive Health and Research, World Health 
Organization. 

The preparation of the first edition of this book was conceived by Dr Boga 
Skrinjar-Nerima while she was Chief Medical Officer at the World Health Or- 
ganization in charge of the development of health statistical services. Her contri- 
bution is still highly appreciated. 
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The World Health Organization wishes to thank the following eminent teachers 
who made invaluable contributions to this edition of lesson and seminar out- 
lines: Professor E. Baingboye, Department of Family and Community Medicine, 
College of Medicine, King Saud University, Riyadh, Saudi Arabia; Professor R. 
Biritwum, Department of Community Health, University of Ghana Medical 
School, Accra, Ghana; Professor A. Indrayan, Division of Biostatistics and Medi- 
cal Informatics, University College of Medical Sciences, Delhi, India; and Profes- 
sor I<. Sumbuloglu, Department of Biostatistics, Hacettepe University, Faculty of 
Medicine, Ankara, Turkey. Thanks are also due to all the teachers and colleagues 
who contributed to the first edition of the book or reviewed the various drafts of 
this revised version. 

This publication is specially dedicated to the memory of the late Ron Lowe, C.B.E., 
Emeritus Professor of Community Medicine, Welsh National School of Medi- 
cine, Cardiff, Wales, who helped and guided the Organization's efforts towards 
the improvement of the teaching of statistics and their use in epidemiology and 
public health. 



Introduction 



Historical background 

Over the past twenty years, the World Health Organization has been devising 
strategies for improving the teaching of statistics to health personnel, in recogni- 
tion of the need to train future health workers in information support to health 
care delivery and management. Earlier efforts were directed towards improving 
the teaching abilities of teachers by promoting modern educational techniques. 
This was done through workshops, seminars and meetings. The main products 
of these efforts were: 

• A manual for teachers of medical students, 1 sponsored jointly by WHO and 
the International Epidemiological Association; this was a direct result of one 
of the recommendations made by a group of teachers who participated in a 
WHO travelling seminar in 1973. 

• A report of the Inter-Regional Conference on Teaching Statistics to Medical 
Undergraduates, held in Karachi, Pakistan, in 1978. 

• A guide on organizing a workshop for those responsible for teaching statis- 
tics to medical students. 

The extent of the statistical knowledge and skills that health workers need to 
acquire varies from country to country, depending on the health problems of 
the country, the technological capacity of the country to handle these problems, 
and the career prospects of the students on graduation. Nevertheless there is a 
core of statistical knowledge that all students need to have, irrespective of their 
country of training. In recognition of this need, the 1978 Inter-Regional Confer- 
ence in Karachi recommended: 

", . . the development, under WHO, of an internationally acceptable standard 
basic curriculum for teaching health statistics to medical students, which could 
be adapted by medical schools to meet the needs and conditions of their own 



In response to this recommendation, the World Health Organization coordinated 
the development of a series of lesson and seminar outlines, published in 1986 as 
Teaching health statistics: twenty lesson and seminar outlines . Those outlines were 
prepared by a group of teachers with long experience of teaching statistics to 
medical undergraduates. The outlines aimed at offering teachers of statistics to 
medical undergraduates a starting point for organizing the material they should 
teach. 



1 Lowe CR, Lwanga SK, eds. Health statistics: a manual for teachers of medical students. Oxford, Oxford 
University Press, 1978. 



countries. 
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Rationale for revising the outlines 

Since the publication of the original outlines, there have been extensive techno- 
logical developments in data handling and information communication. The aim 
of health care is no longer merely to cure illnesses, but also involves the preven- 
tion of illness and the maintenance of health. The broad field of health is now 
seen as encompassing the social, biological and economic environment of 
people. The principle of primary health care is now accepted by all countries. 
The delivery of health care (in its broad sense) is, therefore, no longer the 
responsibility of medical doctors alone. 

Educational material to improve the capability of health workers to handle data 
and use these data in monitoring their activities should, therefore, be aimed 
at all future health workers, not medical students alone. The training material 
should also cover areas relevant to the objective monitoring of health pro- 
grammes and activities. With these requirements in mind, the outlines have 
been revised to guide the teacher in deciding what to teach health workers (not 
necessarily medical students only) if they are to be objective in monitoring their 
activities. 

The following new topics have been included in the revised version: 

• indicators of levels of health; 

• health information systems; 

• use of computers in health sciences; 

• rapid methods for interim assessment. 



Need for a comprehensive course in health statistics 

Knowledge of, and competence in, the application of statistical principles and 
methods are necessary, not only for an understanding of the biological and medical 
sciences, but also for effective practice in any of the health professions. Because 
of the variability of biological, clinical and laboratory data, the science of statis- 
tics is necessary and central to their understanding and interpretation. 

Every student training in the health field should complete a course in health 
statistics for the following reasons. 

• A knowledge of statistics is required in order both to understand the ration- 
ale on which diagnostic, prognostic and therapeutic decisions are — or should 
be — based, and to appreciate that medicine is highly dependent on concepts 
of probability. 

• Within their competence, health workers need to interpret laboratory tests 
and bedside observations and measurements in the light of a knowledge of 
physiological, observer and instrument variation. 

• Health workers must know and understand the statistical and epidemiologi- 
cal facts about the etiology and prognosis of the diseases that they treat in 
order to give the best advice to their patients about how to avoid or limit the 
effects of these diseases. 

• Health workers are the primary generators of the data on which health sta- 
tistics are based. They therefore need to know how data can and should be 




21 



i 



INTRODUCTION 



used, both for the benefit of their own practices, and for the organization and 
delivery of health care in their countries. 

• Health managers need to know how to interpret and draw inferences from 
the indicators that describe health levels, trends and resources. 

• The study of statistics helps to foster in students the critical and deductive 
faculties that they will need throughout their studies and, after graduation, 
in their practices. 

Well organized training in statistics contributes to the long-term reorientation of 
country health information systems to respond to future health requirements, 
by improving data generation, handling, processing and use. 



The course 

Health statistics as a basis of epidemiological methods is the foundation upon 
which health managers can assess health trends and situations, and monitor the 
progress of the various interventions. 

Some teachers may find that some of the topics they would like to teach are not 
included in this book. Others may find that topics they would not consider im- 
portant have been covered at length. The choice of topics was, in fact, based on 
the Karachi recommendations, taking into account the consensus view of those 
preparing the outlines in consultation with teachers of the different cadres of 
health workers. It was felt that the topics not included in these outlines are 
generally best taught at the postgraduate level. 

Since the revised material is aimed at all trainee health workers, teachers will 
have to be selective as to what to include in the curricula of the different 
types of students. Many teachers may also find that not enough time is allocated 
for the statistics course to enable them to cover all the material. In such a 
situation, they should concentrate on what they regard as the high priority 
topics. 

In recognition of the diversity of teachers and teaching methods, the lesson and 
seminar outlines given here are deliberately presented in a variety of ways. An 
attempt has been made, however, in each case to include a clear statement of 
the aims and enabling objectives. Similarly, class exercises are presented in a 
variety of ways. The emphasis of the exercises should not be on computational 
skills but on the ability to interpret the results. 

References for use by teachers and students are given for each outline. Where 
examples are extracted from published documents or books, full references are 
given to allow teachers to refer to the original if they so wish. 

The outlines are intended to be a guide for teachers in preparing lessons and 
seminars, and in deciding on course content. They are not intended to be sub- 
stitutes for fully prepared lessons and seminars. Moreover, they are written 
neither as self-instructional material for students, nor as a textbook in statistics 
for teachers lacking in formal statistical training. 

The outlines are divided into three parts: 

• Part I (Outlines 1 to 10) covers statistical principles and methods; 
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• Part II (Outlines 1 1 to 16) covers health statistics, including demography and 
vital statistics; 

• Part III (Outlines 17 to 23) covers statistics in medicine, including medical 
records. 

Handouts for students are appended to all the outlines. Teachers should judge 
whether the examples contained in the handouts are of relevance to their stu- 
dents and make any necessary adjustments. For example, if the data used in the 
handout are not applicable to the country in which the teaching is carried out, 
appropriate data for that particular country should be used instead. 

There is no fixed number of sessions for each lesson or seminar. Teachers should 
feel free to design lessons and seminars themselves, on the basis of the outlines. 
The number of sessions will depend on teachers' preferences and on the avail- 
ability of time. Time should be provided for class exercises. 

The teaching of statistics should not be carried out in isolation from the other 
disciplines in the health curriculum, but should be integrated whenever pos- 
sible. The role of a statistics course in providing training in information support 
for the health field should not be forgotten. Statistics should not be taught as 
an end in itself, but as a means through which other disciplines may be better 
understood and implemented. 



Statistical principles 
and methods 



OUTLINE 1 



Introduction to the role of statistics in 
health sciences and health care delivery 



Introduction to the lesson 

Statistical methods are consciously or subconsciously applied in health care delivery at the com- 
munity and individual patient levels. At the community level, they are used to monitor and 
assess the health situation and trends, or to predict the likely outcome of an intervention pro- 
gramme. At the patient level, they are used to arrive at the most likely diagnosis, to predict the 
prognostic course and to evaluate the relative efficacy of various modes of treatment. Knowl- 
edge of statistics is also essential for a critical understanding of the medical literature. Statistical 
principles are essential for planning, conducting and interpreting biomedical, clinical and com- 
munity health research. 



Objective of the lesson 

The objectives of this lesson are to introduce the students to the role of statistics in the health 
sciences, health care delivery, the study of human populations, and the management of uncer- 
tainty. The lesson also aims to create an awareness of the need to acquire an understanding of 
statistical principles and methods. 



Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain various meanings of the term statistics. 

(b) Indicate, through examples (without necessarily going into great detail), how statistical 
principles and concepts are relevant in the following situations: 

• handling of variation in characteristics (for example, physiological or chemical) encoun- 
tered in the field of health care; 

• diagnosis of patients' ailments and health problems of communities; 

• prediction of likely outcomes of disease intervention programmes in communities or of 
diseases in individual patients; 

• selection of appropriate forms of treatment for individual patients; 

• public health administration and planning; 

• planning, conducting, analysing, interpreting and reporting of medical research. 

(c) List sources of uncertainty in health sciences and health care delivery. 

( d ) Describe the role of statistics in the management of uncertainty in health sciences and health 
care delivery. 
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Required previous knowledge 

The foundation blocks of a course on health statistics are: 

• experience of making and using measurements that have demonstrated inter-individual and 
intra-individual variation, and variations due to observers, methods or instruments; 

• knowledge of the concerns of medicine and of health systems; 

• knowledge of the broad meaning of diagnosis, prognosis and treatment. 

Should any of these be missing, then that gap should be bridged before proceeding any further. 



Lesson content 

Meaning of statistics 

In everyday use the term statistics means data, numerical observations or quan- 
titative information, such as: 

• the number of trained community health workers in different districts of a 
country; 

• the birth weight of babies born in a hospital during a specified period; 

• the number of homosexual men in a defined location who are HIV positive; 

• the prevalence rate of schistosomiasis per 1000 population in various dis- 
tricts of a country; 

• the amount of creatinine in mg per litre in a 24-hour urine specimen. 

Statistics is also defined as a discipline or a science of managing uncertainties in 
decision processes — the scientific methods of collecting, processing, reducing, 
presenting, analysing and interpreting data, and of making inferences and drawing 
conclusions from numerical data. 

Main uses of statistical methods 

Statistical methods are mainly used in the following three activities: 

(a) Collection of data in the best possible way by: 

• adopting a suitable and appropriate method for selecting subjects for study, 
to minimize the role of uncertainty (example: selecting people for a health 
interview by means of lots); 

• designing valid data collection instruments, such as questionnaires and 
schedules (example: construction of a questionnaire to collect data on 
anaemia in pregnant women); 

• organizing the data collection procedures for clinical and laboratory re- 
search, epidemiological studies and population surveys to minimize the 
chances of errors (example: standardization of definitions, and training of 
workers involved in collecting data on births and deaths). 

( b ) Description of the characteristics of a group or a situation, accomplished 

mainly by: 

• data presentation in terms of tables and graphs; 
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• calculating summary measures, such as averages, which can adequately 
represent the structure of the data set. 

(c) Analysing data and drawing conclusions from such analysis: this involves 
analytical techniques and the use of probability concepts in drawing 
conclusions. 

Uses of statistical concepts and methods in health sciences and health care 
delivery 

The use of statistics is essential for making judicious decisions in health care 
delivery, at the levels of both the community and individual patients. Medicine, 
as a discipline, deals with individuals who exhibit variations in different 
characteristics, such as weight, blood pressure, cholesterol level and lung func- 
tions. The healthy state for each characteristic varies from person to person de- 
pending upon biological factors, such as the person's age, sex and genetic 
constitution. Environmental factors, such as diet, stress and strain, lifestyles 
and the availability of health facilities, can also affect these characteristics. 
No two persons or groups of persons are ever exactly alike. Notwithstanding 
these variations, decisions on the delivery of health care are based on experi- 
ence with other patients or communities with similar biological and social 
characteristics. 

Because of these variations, the outcomes of decisions cannot be predicted 
exactly: they are always accompanied by an element of uncertainty. This is the 
probabilistic nature of medicine. It is thus necessary to be conversant with the 
proper techniques for dealing with such variations and uncertainties. 

Statistical skills are also helpful in developing a critical thinking faculty, in order 
to be able to: 

• think scientifically, logically and critically about health problems; 

• properly assess the available evidence for decision-making; 

• be aware of possible risks associated with medical decisions; 

• identify decisions and conclusions that lack a scientific and logical basis. 

Statistical principles and methods are applied in various aspects of health sci- 
ences and delivery of health care. For example: 

Handling of variation 

Variation in a characteristic (or factor or measurement) occurs when its value 
changes from subject to subject, or from time to time or instrument to instru- 
ment within the same subject, or from observer to observer. Nearly all charac- 
teristics encountered in health care delivery, whether environmental, 
physiological, biochemical or immunological, exhibit such variation. For example, 
there is variation in blood pressure from person to person, from morning 
to evening, before and during an excitement, in sitting and supine positions, in 
recordings by different people, and in measurement by mercury and aneroid 
instruments. 

These variations require that appropriate methods be used when trying to: sum- 
marize a characteristic for a group of patients or for a community; decide, for a 
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particular characteristic, the ideal or normal or average value; and compare two 
groups of patients, or two communities, with respect to a particular characteris- 
tic. Only when the various aspects of variation have been clearly defined can 
appropriate statistical methods for summarizing or comparing characteristics (or 
factors or measurements) be decided on. 

Diagnosis of patients' ailments and communities' health problems 

Diagnosis is the process of identifying the factors responsible for a specific 
disease in an individual or group of individuals. Distinct disease entities, based 
on clustering of signs, symptoms and values of biochemical measurements, are 
often established by procedures employing implicit statistical methods. 
There is always a risk of being wrong in identifying the health status of an indi- 
vidual or a community with one of the diagnostic categories. The signs and symp- 
toms may not be fully typical of a particular diagnosis and may occur in more 
than one diagnostic category. For example, complaints of abdominal pain, vom- 
iting and constipation of long duration are frequently seen in abdominal tuber- 
culosis but can also occur in amoebiasis and hepatitis. The reason for high maternal 
mortality in a given area could be malnutrition of the women (because of igno- 
rance or poverty) or poor sanitary practices at the time of delivery (because of 
ignorance, poverty or the unavailability of adequate maternal services). 

Statistical reasoning is often unconsciously employed when a disease category 
is selected as being the most likely to be correct. Explicit statistical methods are 
available for ordering disease categories according to their probabilities of being 
the correct diagnosis. 

Prediction of likely outcome of an intervention programme in a community or 
of treatment of individual patients 

Prognosis is the assessment or prediction of the likely outcome of an interven- 
tion programme in a community or of disease in patients in the light of the 
presenting symptoms, signs and circumstances. An outcome is predicted when 
the chances of its occurrence are high and the associated uncertainty is low. The 
exercise of prediction is thus inherently statistical. 

Data are needed to achieve a more reliable prediction of the likely outcome of 
an intervention programme in a community or of treatment of individual pa- 
tients. Characteristics observed at the outset of the programme or at initial ex- 
amination and during treatment, and the eventual outcome of the disease in the 
community or in patients previously seen by the clinician must, therefore, be 
recorded and kept. The records can then be analysed to determine the trends in 
the results for different types of communities or individuals. Prediction of the 
outcome of a new intervention programme or treatment is based on the results 
of such an analysis. For example, to assess the effect of improved water supply 
on the health of a community, information is needed on the health problems of 
the community before the introduction of the improved water supply. 

Selection of appropriate intervention for a patient or a community 
This is based on the following: 

• previous experience with similar patients or communities that had received 
the intervention; 
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• reports of clinical trials or experiments to assess the relative efficacy of differ- 
ent drugs and other methods of treatment; 

• objective assessment of the health worker's previous experience. 

The design, execution and analysis of medical experiments and intervention 
programmes must employ sound statistical and epidemiological principles and 
methods if the findings and conclusions are to be valid. Otherwise, interven- 
tions may unknowingly be ineffective and even harmful. 

Public health, health administration and planning 

The major application here is the use of data relating to health and illness in the 
population in order to make a community diagnosis. This requires knowledge 
of: 

• population characteristics, such as size, and age and sex structure; 

• the health profile of the population in terms of disease risk factors; 

• the influence of environmental factors on different aspects of health; 

• other factors affecting population dynamics: data on births, deaths and 
migration. 

In health administration and planning, use is also made of data on the distribu- 
tion of health care resources (need, availability, utilization and so on) by differ- 
ent segments of the population at all levels. 

Health workers need to know how to interpret and use these statistical indices. 
As the main generators and users of these statistics, they have to ensure that the 
statistics they use are accurate. 

Planning, conducting, analysing, interpreting and reporting of medical research 

All medical studies, whether in the form of analytical research or descriptive 
surveys, depend on proper collection, analysis and interpretation of relevant 
data. The validity of such studies depends on the application of sound statistical 
and epidemiological principles. 

In order to keep abreast of developments in their profession, health workers 
must be able to read, understand and critically evaluate medical reports. 



Data collection, processing, reduction, summary, analysis and presentation; health statistics; 
probabilistic approaches; probability; statistics; uncertainty and error; variation; vital statistics. 



Structure of the lesson 

The lesson content may be presented in the following sequence. Examples taken from current 
medical literature as well as other publications (including daily newspapers when appropriate) 
should be liberally used throughout the lesson to illustrate the important nature of information 
in health science and in the delivery of health care. 
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(a) As an introduction, discuss the general and specific objectives of the course as a whole, 
making it clear that it is not intended to produce health statisticians, but health workers who 
will be able to make rational decisions in their work. Emphasize the use of statistics as a tool 
rather than as an end. Give an overview of the course, its structure, organization, teaching 
methods and timetable. 

(b) Explain the meaning of "statistics" and "statistical methods", giving examples of their appli- 
cation in health care. Explain the need for data in decision-making. Hence, explain the 
importance of the study of survey design, instrument calibration, and data collection, process- 
ing, analysis, presentation, interpretation and communication. 

(c) Discuss the problems posed by variation and uncertainty in: the study of disease etiology, 
causation or risk factors; the evaluation of response to treatment; the determination of "nor- 
mal", "usual" and "ideal" values of characteristics; and hence the methods needed to handle 
them. 

(d) Explain the essential role of statistics in the field of health (for example, in acquiring and 
using medical knowledge, and in medical practice). Use examples to show how decisions are 
made by health workers in the course of their duties (for example, in making a diagnosis, 
assessing prognosis and deciding on the correct treatment for a patient), and by health ad- 
ministrators, planners and evaluators. 

(e) Point out the widespread use of statistical methods in medical journals. Progressive health 
workers depend, to a considerable extent, on literature to update their knowledge. Some- 
times the handouts distributed by pharmaceutical firms also contain statistical results. 
Readers, therefore, need to have the ability to evaluate the validity and reliability of the 
information in these reports. They also need to be familiar with the basic technical language 
of the statistical and epidemiological methods which are commonly used in the medical 
literature. Health workers themselves may have to use this language in the reports of their 
work. 



Lesson exercises 

Lesson exercises should test the students' ability to describe the importance and uses of statisti- 
cal methods in the field of health, and should give as many examples as possible. The exercises 
should, therefore, be of such a nature as to elicit from the students examples of the need for 
statistics and their use in solving health problems. 



I What statistics would be required to decide whether to build a clinic for a 
village in a district ? 



■ Give four areas in health care delivery where the science of statistics is 
applied. 



■ Describe the importance of including somebody with knowledge of statistics 
in a District Health Management Team (DHMT) which has the responsibility of 
health services development in a district. 
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HANDOUT 1.1 



Sources of uncertainty in health and medicine 



(a) Uncertainty is caused by variations in: 

• biological factors (for example, age, sex, birth order, heredity); 

• environmental factors (for example, nutrition, addictions, stresses, water supply, sanitation, socioeconomic 
status, availability and use of health facilities); 

• methodological factors (for example, relating to observers, instruments, laboratory techniques, chemicals and 
reagents, questionnaires or record forms, diagnostic tools such as X-ray); 

• chance and unknown factors (for example, difference in birth weight of identical twins, or varying results from 
repeated samples of blood, urine and tissue). 

(b) Other sources of uncertainty include: 

• incomplete information on the person or patient (patient in coma, lack of facilities for medical investigations, 
illiteracy, recall failure, etc.); 

• an imperfect tool (false positive and false negative results of laboratory and radiological investigations, clinical 
signs and symptoms are sometimes not specific, lack of accepted measure of important concepts such as 
community health, etc.); 

• poor compliance with the prescribed regimen (non-compliance with treatment schedule, imperfect post- 
surgical care, breakdown of a vaccine cold chain, non-acceptance of family planning advice, etc.); 

• inadequate medical knowledge (lack of treatment for AIDS, unknown causes of many cancers, inability to 
restore severely malignant tissues, lack of a universally applicable cheap and effective method to break the 
parasite-vector-host cycle in malaria transmission, unknown specific factors causing women to live longer 
than men, unknown relationship between the mind and physiological and biochemical mechanisms, etc.). 
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OUTLINE 2 



Health data: sources, levels and quality 
of measurement 



Introduction to the lesson 

The systematic and continuous process of health policy formulation, planning, program- 
ming, budgeting, implementation, and general integration of different programmes within the 
overall health system depends on good information support. The types of data gathered and 
the analysis applied to them depend on the potential users and the kind of information they 
are likely to need. The quality of the information given depends on the sources of the data, 
how they are collected, the instruments (equipment, recording forms, etc.) used for data 
collection, and the statistical methods used for analysis. In addition, the amount of information 
that can be obtained from the data and the choice of statistical methods for the analysis 
depend in part on the scale or level (nominal, ordinal or interval) on which the data have been 
measured. 

Objective of the lesson 

The objective of this lesson is to enable the students to understand the nature, sources, types 
and collection of data needed for the planning and management of health programmes and 
activities. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Describe the possible sources of health data. 

(b) Distinguish between regular and ad hoc health data collection systems. 

(c) Describe the procedures for health data collection. 

(d) Discuss the major differences between the three data measuring procedures: 

• instrumental — measurement done technically, without human intervention in the deci- 
sion on the value of the measurement; 

• human — measurement done by persons who decide on the measurement value; 

• by interaction between humans and equipment. 

(e) Explain the concepts of reliability and validity with regard to measurement, and discuss 
their implications for the use of health data. 

(/) Distinguish between the four principal scales of measurement (nominal, ordinal, interval 
and ratio), indicating their respective application for health data collection. 

( g ) Distinguish between quantitative and categorical data. 

(h) Distinguish between random errors and fluctuations. 
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Required previous knowledge 

The meaning of "statistics" and "statistical methods", and their role in the health field, as dis- 
cussed in Outline 1. 



Lesson content 

Sources of health data 

There are two main sources of health data: regular or routine systems; and ad 
hoc systems. 

Regular or routine data collection systems 

A regular or routine data collection system usually consists of established proce- 
dures for collecting data as they become available. Some systems are national 
with legal backing or are demanded by international regulations, while others 
are subnational or even institution-specific. 

Examples: 

• A national vital statistics registration system of births, deaths, marriages and 
divorces. 

• A disease notification system to collect information under the International 
Health Regulations 1 on cholera, plague and yellow fever. 

• A reporting system for cancer cases (cancer registry). 

• Registration systems in health care facilities, to collect information on pa- 
tients attending the various clinics (in-patient and out-patient). 

An advantage of this source is the availability of health data. A major difficulty is 
that such a system may not exist. Even where it exists, there may be deficien- 
cies. The records may not be uniform, or they may be unreliable because they 
are incomplete or inaccurate. 

Ad hoc data collection systems 

Ad hoc data collection is usually in the form of a survey to gather information 
that is not available on a regular basis. This may include special investigative 
studies or merely the collection of additional information as part of routine data 
collection. 

Examples: 

• A national survey of health personnel. 

• A survey to estimate the proportion of children with malnutrition in a de- 
fined population. 

• A study to investigate whether the use of hormonal contraceptives affects 
the nutritional status of the user. 

• An investigation of breastfeeding practices among women who registered a 
birth in the previous year. 



See Handout 2.2, 
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One advantage of the ad hoc system is that it provides accurate and reliable data 

in response to the specific needs of the user. Disadvantages include the logistics 

and expenses involved in ad hoc data collection. 

Procedures for data collection 

Regular or routine system 

The procedure for regular data collection is usually along the following lines 

(but not necessarily in the stated order): 

• Decision on items of data to be collected according to the requirements of the 
health information system (for example, health programme monitoring, 
management of the health system). 

• Establishment of rules and regulations instituting the system, and giving it 
legal backing, especially if it is to be a nationwide system. These rules and 
regulations are enacted by a competent authority. 

• Physical establishment of office facilities, recruitment of personnel, and dis- 
semination of appropriate information to the public. 

• Design of forms and registers to be used for recording information. 

• Specification of the recording procedure (for example, who supplies the 
information, when the information has to be registered). 

• Specification and design of registration receipts: these are tokens given to the 
person registering an event to indicate compliance. Examples are hospital 
registration cards, birth and death registration certificates. 

• Training personnel. 

Ad hoc system 

The steps involved in the organization of data collection on an ad hoc basis are: 

• Definition or statement of the objectives of the collection exercise, indicating 
what type of information is needed, the data to be collected, and how the 
information is to be used. 

• Definition of the population for which information is required (the reference 
or target population). 

• Decision on whether information will be collected from all or some of the 
units in the reference population. 

• Decision on how many respondents (those from whom information will be 
collected) are to be included in the study (sample size). 

• Decision on how these respondents will be selected. 

• Design of the instruments (forms, etc.) to be used for data recording. 

• Selection and training of personnel to collect the data. 

• Mode of data collection (for example, personal interview, self-administered 
questionnaire, telephone). 

• Identification of selected units and data collection. 



TEACHING HEALTH STATISTICS 



14 



Data measuring procedures 

There are three main types of data measuring procedures: 

(a) Instrumental: measurement is done technically, without human interven- 
tion in the decision on the value of the measurement. 

Examples: electronic equipment such as weighing scales, thermometers, 
spectrophotometers, sphygmomanometers, blood testing equipment. 

(b) Human: measurement is done by persons who decide on the measurement 
value. 

Examples: auscultation of the heart, grading spleen enlargement, taking a 
patient's medical history, reading mercury column sphygmomanometers, 
reading weight using a spring scale. 

(c) Combination of human and instrumental. 

Examples: reading of X-ray films, reading of blood films. 

Quality of measuring instruments 

Two desirable characteristics of data measuring procedures are reliability and 
validity. 

Reliability 

Reliability deals with the inherent performance of the procedure. A reliable 
procedure is one that gives consistent results when it is applied more than once 
to the same subject under similar conditions. Major factors affecting reliability 
are: 

(a) The inherent variation of the procedure itself. Examples: fluctuating zero 
mark in a weighing scale, non-stability of chemical reagents. 

( b ) Fluctuations in the variable being measured. Examples: patients giving dif- 
ferent answers, depending on their understanding of the questions, during 
history taking. 

(c) Observer error: a single observer may obtain different results in repeated 
measurements of the same unit. Examples: repeated blood pressure meas- 
urements; age determination (when date of birth is unknown); repeat 
microfilaria count on a stained slide; temperature reading off a mercury ther- 
mometer. 

(d) Inter-observer error (observer variation): differences between observers. 
Examples: blood pressure measurements; reading of X-rays; reading of blood 
films. 

Validity 

A measurement is valid if it measures what it is supposed to measure. (It is 
easier to illustrate the concept of validity by identifying situations in which a 
measurement may not be valid). 

Examples: 

• Fever may not be a valid "measure" (sufficient indicator) for malaria in areas 
with low malaria transmission levels. 
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• Answers obtained from oral interviews in some societies may not be indica- 
tive of local abortion practices. 

• A married couple not having a child may not be a valid "measure" (indicator) 
of infertility. 

Sensitivity and specificity are two important components of assessing the valid- 
ity of "measuring" procedures. 

The sensitivity of a test, a procedure, or a measuring instrument, is its ability to 
respond to the changes in the factor to which it is being applied. For example, if 
a chemical concentration is being measured, and if a change in the concentra- 
tion produces a large change in the measurement given by a test, the test is said 
to be sensitive. In epidemiology, sensitivity is defined as the proportion of true 
positive observations correctly identified by a test. Using the notation shown in 
Table 2.1, sensitivity is given by the relation a! (a + c). 

Specificity is defined as the extent to which a test, a procedure, or a measuring 
instrument gives a response for the presence of a given variable and is non- 
responsive to the presence of all other variables. In epidemiology, specificity is 
defined as the proportion of true negatives correctly identified by a test. Using 
the notation shown in Table 2.1, specificity is given by the relation d/(b + d). 

Other components of validity in screening tests are the positive and negative 
predictive values of a test. These values may be described using the notation 
for test validation shown in Table 2.1 . The probability that a positive result in the 
test indicates a genuinely positive result is the positive predictive value of the 
test: a! {a + b). The probability that a negative result is genuinely negative is 
the negative predictive value of the test: dl(c + d). 



Table 2. 1 Notation for test validation 



Test results 


True picture 

+ 


Total 


+ 


a 


b 


a + b 


- 


c 


d 


c + d 


Total 


a + c 


b + d 





Sensitivity = a/(a + c) 

Specificity = d/(b + d ) 

Positive predictive value = a/(a + b) 
Negative predictive value = d/(c + d) 



Variables and attributes 

A quantitative variable describes a characteristic in terms of a numerical value; 
the value may vary from subject to subject or from time to time in the same 
subject. The value is expressed in units of measurement. Examples: height in 
metres, blood pressure in mmHg (or kPa). 
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A qualitative or attributive variable describes the attribute of a characteristic (by 
classifying it into categories to which a subject either belongs or does not be- 
long) or a property or quality that a subject either possesses or does not possess. 
Examples: access to some form of health care, sickness, hospitalization, blood 
group, sex. 

Some characteristics can be dealt with in only one way as attributes, while 
others are amenable to transformation from measurement variable to descrip- 
tive attributes. For example, body weight may be studied either as a measure- 
ment variable (weight in kg) or as an attribute (overweight/not overweight). 
Which form is used depends on the reason for the measurement, the require- 
ments of objectivity, reliability and validity, and the properties of the different 
measurement scales. These considerations are explained below. 



Continuous and discrete variables 

A continuous variable is one with potentially an infinite number of possible 
values in any interval. It can assume either integral or fractional values and 
can be measured to different levels of accuracy by using more or less refined 
methods of measurement. Examples: height (in metres): 1.8, 1.76, 1.758; weight 
(in kg): 1 1, 10.8, 10.79. 

A discrete variable can only have a finite number ot values in any given interval. 
The values are invariably whole numbers. They are integers. Examples: number 
of children in a family; number of households in a community; white blood cell 
count; number of beds in a hospital ward. 



Scales of measurement 

It is necessary to express clinical impressions (for example, extent of an individual's 
illness or the health level of a community) in clear measurements, either in 
units of some physical device, or categories such as disease stage. Each level of 
measurement is, however, defined by the degree of accuracy and sophistication 
of the measuring device. 

The four principal scales used to measure data are the nominal, ordinal, interval 
and ratio scales. 

A nominal or classificatory scale is one in which names, labels or tags are given 
to distinguish one measurement from another on the basis of certain qualities 
or attributes. Measurement on this scale does not include any notion of 
magnitude. 

Examples: 

• Outcome of disease in a patient can be measured as survival or death. 

• National commitment to primary health care may be judged as existent or 
non-existent. 

• Psychiatric patients may be classified as psychotics, neurotics, manic 
depressives or schizophrenics. 

An ordinal or a ranking scale has the characteristics of the nominal scale 
described above, with an implicit order relationship among the measurements. 
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Examples: 

• Lack of proper food for nursing mothers and children in a drought-stricken 
area may be classified as critical, severe, moderate or slight. 

• Social status of patients may be measured as upper, middle or lower class. 

An interval scale is characterized by a numerical unit of measurement, such 
that the difference between any two measurements is explicitly known in 
terms of an interval between the two measured points. The unit of measure- 
ment and the zero point (the origin or starting point) of the scale interval are 
arbitrary and only fixed by convention. Example: body temperature is usually 
measured on an interval scale, of which the unit may be, for example, degrees 
Celsius (°C). 

Although the ordinal scale can be transformed into a pseudo-interval scale 
by assigning scores to its measurement categories, it retains the qualities of an 
ordinal scale. 

The ratio scale has all the characteristics of the interval scale, as well as a true or 
absolute zero, so that the ratio between two values on the scale is a meaningful 
measure of the relative magnitude of the two measurements. Examples: height 
in metres; weight in kg. 

Certain arithmetical operations are permissible on each scale. 

Nominal scale. The arithmetical operation of "equivalence" is permissible on this 
scale. For example, one "woman" is equivalent to another "woman". Equivalent 
measurements can be aggregated into a particular category and counted. Pro- 
portions belonging to each measurement category out of the total number meas- 
ured can be calculated. 

Ordinal scale. On this scale one measurement can be equal to another (equiva- 
lent) or described as greater (higher) than or less (lower) than the other. The 
difference between one measurement and another is not explicit, and differ- 
ences between adjacent measurements are not equivalent. Again, equivalent 
measurements can be aggregated into a measurement category and counted, 
and the proportion in each category calculated. 

Interval scale. Arithmetical operations permissible on this scale include all those 
allowed on the ordinal scale; in addition, measurements can be added, subtracted, 
divided and multiplied by a constant, to yield interpretable results. Comparison 
between intervals on this scale is meaningful, and is independent of the unit of 
measurement or the system of assigning scores. 

Ratio scale. All arithmetical operations are permissible, and the ratio of any two 
measurements is meaningful and independent of the unit of measurement. 

Quantitative and categorical data 

Data can be divided into two broad categories according to the strength of the 
scale of measurement: categorical data and quantitative data. 

Categorical data are measurements in which the notion of magnitude is absent 
or implicit. Such variables are measured either on a nominal or an ordinal scale. 
These data are also referred to as attributive or qualitative. 
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Quantitative data have numerical magnitude. They are measured either on an 
interval or on a ratio scale. 




Ad hoc sources of data; attribute; categorical data; continuous scale; descriptive statistics; dis- 
crete variable; objective measurement; predictive value (positive and negative); qualitative de- 
scription; quantitative data; quantitative description; reliability; routine sources of data; scales of 
measurement; sensitivity; specificity; subjective measurement; validity; variable. 



Structure of the lesson 

The contents of the lesson may be covered in the following sequence, using illustrative examples 
from the literature, wherever possible. 

(a) Explain the meaning and importance of health data and their place in medical information 
and knowledge. 

(b) Discuss the different types of health data sources, and the systems and procedures of data 
collection. Explain their relative usefulness, quality of data, and cost. 

(c) Differentiate between routine and ad hoc data collection systems. Discuss the status and 
usefulness of the following data collection systems: vital registration, disease surveillance, 
service reporting, specific health programmes, and administration in your locality. 

(d) Differentiate between quantitative and qualitative description, objective and subjective meas- 
urement criteria, and quantitative and categorical or attributive data. 

(e) Describe the four scales of measurement and explain their properties in terms of the amount 
of information conveyed, reliability and validity. 

(/) Describe the various instruments for measuring data and how to assess their qualities. 

Lesson exercises 

The teacher should show data sets on about four or five variables from different sources. The 
exercises should focus on the identification of the source, distinguishing between continuous 
and discrete data, and identification of the scales of measurement. Additional exercises should 
test the understanding of validity, reliability and types of errors that may be associated with the 
data sets collected. 

Ask class members to do the following exercises. 



■ List three variables or attributes used in the area of public health, and for 
each one, state: 

• its scale of measurement; 

• the type of variable (quantitative, attributive); 

• whether it is discrete or continuous; 

• the type of instrument used for its collection. 
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■ List five items of information collected within the national health informa- 
tion system through: 

® a regular data collection system; 

• ad hoc survey methods. 



9 Describe three situations in which measurements are made in the clinic or 
public health setting (one for each data measuring procedure). State the factors 
that may cause the procedures to be unreliable. 



9 Describe how data are classified in general, and the different scales of meas- 
urement of data. 



9 Distinguish between sensitivity and specificity of a diagnostic test. 



9 Illustrate with two examples how a variable can be measured on more than 
one scale. 
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Definitions of new terms and concepts 



Attribute: An attribute is a variable that describes a characteristic by classifying it into categories to which a subject 
either belongs or does not belong. It is a property or quality that a subject either possesses or does not 
possess. 

Categorical variable: A categorical variable has measurements in which the notion of magnitude is absent or 
implicit. 

Continuous variable: A continuous variable is one with potentially an infinite number of possible values in any 
interval. 

Discrete variable: A discrete variable has measurements that occur as integers. It can only have a finite number of 
values in any given interval. 

Reliability: Reliability deals with the inherent performance of the procedure. A reliable procedure is one that gives 
consistent results when it is applied more than once to the same unit under similar conditions. 

Sensitivity: The sensitivity of a screening test is a measure of the probability of correctly diagnosing a case. It is 
the proportion of truly diseased persons in the screened population who are identified as diseased by the 
screening test. 

Specificity: Specificity is a measure of the probability of correctly identifying a non-diseased person with a screening 
test. It is the proportion of persons who are not truly diseased, who are so identified by the screening test. 

Validity: A measurement is valid if it measures what it is supposed to measure. 

Variable: A variable is any attribute, phenomenon or event that can have different values. 
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HANDOUT 2.2 



International vaccination requirements 1 



International Health Regulations 

The purpose of the International Health Regulations, adopted by the World Health Assembly in 1969, is to help 
prevent the international spread of diseases and, in the context of international travel, to do so with the minimum of 
inconvenience to the passenger, This requires international collaboration in the detection and reduction or elimination 
of the sources from which infection spreads rather than attempts to prevent the introduction of disease by legalistic 
barriers that over the years have proved to be ineffective. Ultimately, however, the risk of an infective agent becoming 
established in a country is determined by the quality of the national epidemiological services and, in particular, by 
the day-to-day national health and disease surveillance activities and the ability to implement prompt and effective 
control measures. 

No regulations can be expected to foresee every disease eventuality and, in certain situations, diseases and conditions 
other than those covered by the International Health Regulations may be of concern to national health authorities 
and the travelling public. The International Health Regulations obviously cannot refer specifically to diseases that 
were not known at the time they were last revised; this is the case with acquired immunodeficiency syndrome 
(AIDS). Nevertheless, any requirement of an HIV antibody test certificate ("AIDS-free certificate") is contrary to the 
Regulations. 

The International Health Regulations are currently being revised in accordance with a resolution adopted by the World 
Health Assembly in 1995. The purpose of the revision is to develop Regulations that are adapted to the present 
volume of international traffic and trade and take account of current trends in the epidemiology of communicable 
diseases, including emerging disease threats. 

Smallpox 

The eradication of smallpox was confirmed by WHO nearly 20 years ago. Smallpox vaccination is no longer indicated, 
and may be dangerous to those who are vaccinated and those in close contact with them. 

Cholera 

Vaccination against cholera cannot prevent the introduction of the infection into a country. The World Health Assembly 
therefore amended the International Health Regulations in 1973 so that cholera vaccination should no longer be 
required of any traveller. 

The traditional parenteral cholera vaccine conveys incomplete, unreliable protection of short duration and its use is 
therefore not recommended. 

Yellow fever vaccination certificate 

Urban and jungle yellow fever occur only in parts of Africa and South America. Urban yellow fever is an epidemic viral 
disease of humans transmitted from infected to susceptible persons by the Aedes aegypti mosquito. Jungle yellow 
fever is an enzootic viral disease transmitted among nonhuman primate hosts, and occasionally to humans, by a 
variety of mosquito vectors. 



1 Adapted from International travel and health. Geneva, World Health Organization, 1998: Chapter 2. 




TEACHING HEALTH STATISTICS © WORLD HEALTH ORGANIZATION 1999 

£ * o 



21 




HANDOUT 2.2 (continued) 



A yellow fever vaccination certificate is now the only certificate that should be required in international travel, and 
then only for a limited number of travellers. 

Many countries require a valid International Certificate of Vaccination from travellers arriving from infected areas or 
from countries with infected areas, or who have been in transit through those areas. Some countries require a certifi- 
cate from all entering travellers, including those in transit. Although there is no epidemiological justification for this 
latter requirement, which is clearly in excess of the International Health Regulations, travellers may find that it is 
strictly enforced, particularly for people arriving in Asia from Africa or South America. 

On the other hand, vaccination is strongly recommended for travel outside the urban areas of countries in the yellow 
fever endemic zone, even if these countries have not officially reported the disease and do not require evidence of 
vaccination on entry. 

The vaccination has almost total efficacy, while the case-fatality rate for the disease is more than 60% in adults 
who are not immune. Tolerance of the present vaccine is excellent. The only contraindication to its use, apart from 
true allergy to egg protein, is cellular immunodeficiency (congenital or acquired, the latter sometimes being only 
temporary). 

The period of validity of an international certificate of vaccination against yellow fever is 1 0 years, beginning 1 0 days 
after vaccination. If a person is revaccinated before the end of this period, the validity is extended for a further 10 
years from the date of revaccination. If the revaccination is recorded on a new certificate, travellers are advised to 
retain the old certificate for 10 days until the new certificate becomes valid. 
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OUTLINE 3 



Health information systems 



Introduction to the lesson 

A health information system (HIS) provides information for the management of a health pro- 
gramme or system and for monitoring health activities. A HIS is made up of mechanisms and 
procedures for acquiring and analysing data and providing information (such as management 
information, health statistics and health literature) needed by: 

• all levels of health planners and managers for the planning, programming, budgeting, moni- 
toring, control, evaluation and coordination of health programmes; 

• health care personnel, health research workers and educators in support of their respective 
activities; 

• national policy makers, socioeconomic planners and the general public outside the health 
sector. 

Objective of the lesson 

The objective of this lesson is to provide the students with an understanding of the importance 
of information-based health services management and a health care delivery system. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain the importance of information-based health services management. 

(b) Describe the role of health personnel in the health data generation process. 

(c) Identify the relevant sources of data for a HIS. 

(d) Describe the various types of HIS (for example, public, hospital, private sector). 

(i e ) Describe the various levels of a HIS. 

(/) Describe the uses of a HIS in decision-making. 

Required previous knowledge 

Sources of health data, the structure of the health care delivery system. International 
Classification of Diseases. 



Lesson content 

Definition and description of a health information system 

A HIS is made up of mechanisms and procedures for acquiring, analysing, using 
and disseminating health data for health management. 
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The role of a health information system in health services 

The main use of a HIS is to support decision-making by helping to identify 
areas for action, setting priorities and evaluating the results of the decision 
made. 

Decision-making process based on health information 

The decision-making process follows the following steps: 

• identification of issues; 

• examination of relevant information for allocating priorities to the problems; 

• setting of goals; 

• selection of possible solutions to the problems; 

• deciding which solutions to implement; 

• implementation of the chosen solutions; 

• assessment of the results. 

The role of health personnel in generating health data 

All categories of health personnel are involved in the generation and use of 
health information at all levels of the health system. Accurate recording of health 
events, good record keeping, correct analysis and prompt reporting of health 
data determine the quality and usefulness of the HIS. 

Types of health data collection system 

Health data are either collected routinely or obtained through ad hoc exercises 
(see Outline 2). Data are usually collected from all the health reporting units. In 
certain circumstances, however, sentinel reporting units can be established to 
provide special or additional information for the HIS. 

Sentinel reporting units are specially selected health units used as observation 
windows of the health data reporting process. The selected units may 
receive special attention and support because of their HIS-related activities. 

Relevant sources of data (health information subsystems) 

A HIS may be divided into five different subsystems: 

• disease surveillance; 

• service reporting; 

• specific health programmes; 

• administration; 

• vital registration and census. 

Various types of HIS may exist in the same country: the national (public) sys- 
tem, the private sector information systems and others. Each type of system 
often has several levels, from the community level (small health units), through 
the district, regional or provincial levels, to the national level. The system relies 
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on feedback procedures continuously to improve the quality of data and 
information. 

Health information system data management 

The data generated by a HIS has to be managed correctly and efficiently to yield 
the desired information. This management is carried out at the micro (collection 
point) level and macro (district, national, etc.) level. Data management involves 
collation, checking on accuracy and completeness, storage, processing, analysis, 
report generation and information communication. Computers are useful tools 
for data management at all levels of a HIS. 

Desirable characteristics of a health information system 

The key desirable characteristics of a HIS are that it should be: 

• used by and cover all levels of the health system; 

• affordable and manageable; 

• flexible, functional, useful, reliable and relevant. 
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Decision-making; health information system; health policy; health programmes; health system 
management; feedback; disease surveillance; vital registration. 



Structure of the lesson 

When presenting this lesson, the teacher should: 

(a) Clearly define what a HIS is, explaining the roles played by the users of the health system, 
the health care providers and the managers. 

( b ) Discuss the need for information-based decision-making, and the importance of an informa- 
tion system and its subsystems in this process. 

(c) Build up the description of a HIS from the simple information system of a health unit 
in a small community (for example, the village health worker's information system), 
through other referral health facilities in a district, to the region, and finally to the national 
level. 

( d ) Describe the information systems of departments of a health facility, through the system of 
the whole facility, to the complex country system. 

( e ) Prepare a package of the forms used by the national HIS to show to the students. Prepare a 
handout listing the forms (see Handout 3.2). 

(f) Draw examples from the students' environment to illustrate: 

• the structures (from the peripheral units, medical records departments of hospitals, up to 
the headquarters); 

• management (dates and frequency of reporting, local analysis and feedback mechanisms); 

• use (development of indicators and setting of priorities) of a HIS. 
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(g) Explain the types and training of the personnel of the system, with particular reference to 
the country of interest to the students. For example, the personnel of the HIS may consist of 
the following: 

• Medical records department: records officer, assistant records officers, records assistants, 
statistical assistants. 

• National health statistics unit: medical statistician, records officer, biostatistics assistants, 
computer system analysts, computer programmers. 

(h) Describe the health data collecting forms in use in the HIS (see Handout 3.2). If possible, the 
forms should be reviewed for any improvements that may be warranted. 

(/) Explain the system of reporting and all the legislation regarding health information 
reporting. 

(j) Explain the use of computers in a HIS, for storage, retrieval and processing of health 
data. 



Lesson exercises 

The teacher should set exercises that test the students' knowledge of the various components of 
a HIS, the relevant forms for data collection, the factors that affect the quality of the data, and 
the usefulness of the data. 



■ List six important forms in use in one health information subsystem of your 
country. For each form, describe the information to be derived and how it is 
used. 



■ Give five factors that can affect the quality and timely reporting of informa- 
tion from the HIS. 



■ Select one of the forms in use in the HIS and describe its use, in covering: 

• frequency of reporting; 

• latest date for submission of forms; 

• channel of reporting; 

• required local analysis; 

• feedback. 



■ Describe the characteristics, advantages and disadvantages of a sentinel re- 
porting system in your country. 



■ Describe the contribution of the nongovernmental sector to the national HIS. 
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HANDOUT 3.1 



Definitions of new terms and concepts 



Disease surveillance: The continuing scrutiny of all aspects of occurrence and spread of diseases to detect changes 
in trends or distribution, as a basis for instigating control measures. 

Feedback: The process by which information is passed back to the people providing the data. To be effective the 
information should have useful analytical comments. 

Health information system (HIS): The mechanisms and procedures for acquiring and analysing data, and provid- 
ing information (for example, management information, health statistics, health literature) for the manage- 
ment of a health programme or system, and for monitoring health activities. 

Health policy: A set of statements and decisions defining health priorities and main directions for attaining health 
goals. 

Health system management: The management of the interrelated component parts, both sectoral and intersectoral, 
as well as within the community itself, which produce a combined effect on the health of a population. 

Vital registration: The formal recording of events of human life, such as births, deaths, marriages and divorces. 
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HANDOUT 3.2 



List of health forms used in a national health 
information subsystem for disease 
surveillance in Ghana 



Monthly report on inpatients 

For all health units admitting patients: data on discharges and deaths, by diagnosis. 

Report on outpatients (summary of outpatient daily register) 

Patients are counted by diagnosed disease, by age (whether under or over 5 years), and by whether or not they were 
referred to another department or health institution. 

Consolidated monthly maternal and child health (MCH) reports 

Summary of MCH monthly data; antenatal clinic; daily register of children under 5 years, including immunizations and 
malaria prophylaxis. 

Number of first or subsequent visits by normal or underweight children; first visits to antenatal clinic and for compli- 
cations of pregnancy; number and type of vaccine doses given to children under 5 years. 

Monthly report of infectious diseases 

Data are classified as: (a) diseases for which quarantine is required; (b) diseases of global surveillance; and (c) commu- 
nicable diseases. 

Regional/district summary monthly reports of health inspection 

Information is collected from inspections of villages, buildings, meat, drugs issued by health assistants, immunization 
programmes, infectious disease cases, sanitation improvements and water supply protection. 

Annual report on health personnel 

Information is collected on all professional and technical staff, according to type of work, age, sex, and marital status. 
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OUTLINE 4 



Organization and presentation of data 



Introduction to the lesson 

Useful information is usually not immediately evident from a mass of raw data. Collected data 
need to be organized in such a way that the information they contain clearly reveals the patterns 
of variation. Precise methods of analysis can be decided upon only when the data structure and 
characteristics are understood. 

Objective of the lesson 

The objective of this lesson is to provide the students with an understanding of the purpose and 
ways of reducing and presenting data. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) State the circumstances under which health data reduction and presentation would be 
necessary. 

(b) Recognize the relative advantages and disadvantages of tabular and diagrammatic presenta- 
tion of data. 

(c) Explain the uses and the methods of construction of: 

• an ordered array; 

• frequency tables (absolute, relative and cumulative); 

• cross-tabulations; 

• bar charts; 

• pie charts; 

• histograms; 

• frequency polygons (line charts); 

• an ogive. 

(d) Tabulate a given set of data using an appropriate method. 

(e) Use an appropriate diagrammatic method to present tabulated data. 

(/) Describe at least three ways in which diagrammatic presentation of data can be misused. 

Required previous knowledge 

Sources and types of health data and scales of measurement. 
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Lesson content 

Organization of data 

Different ways (either manual or using computers) and stages of bringing to- 
gether data items (assembling data collection forms, extracting the data from the 
forms to create data sets or files, etc.). 



Reasons for health data reduction and presentation 

To provide a concise or compact view of the data set and its principal character- 
istics. This is the first step in the description and analysis of statistical data (see 
Handout 4.2, Example 1). 



Tabular and diagrammatic data presentation 

Data grouping 

Classes or intervals into which the range of the variable has to be divided, in- 
cluding the class limits and class marks. 



Tabular presentation 

Presentation of data in tables so as to organize them into a compact and readily 
comprehensible form. For example, a frequency distribution table gives the 
number of observations at different values of the variable. 

(a) Single variable frequencies: 

• for a qualitative variable (such as the distribution of occupation among 
the 188 people in the study in Handout 4.2, Example 2); 

• for a large data set on a quantitative variable, requiring grouping of the 
data into classes (such as the distribution of intra-ocular pressure of the 
right eye, in Handout 4.2, Example 1). 

(b) Cross-tabulation: 

• Two-dimensional tables, in which two variables are cross-classified (such 
as, the cross-classification of angular stomatitis and occupation, as shown 
in Table 4.3 of Handout 4.2). 

• Three-dimensional tables, in which three variables are cross-classified (for 
example, outcome of treatment by sex and by age group). 



Diagrammatic presentation 

Diagrammatic presentation is the use of a diagram to show the distribution of 
data. Some methods of diagrammatic presentation of data which should be cov- 
ered in this lesson are: 

(a) For qualitative or categorical data: 

• Pie charts 

— A circle is divided into sectors with areas proportional to the frequencies 
or the relative frequencies of the categories of the variable. 
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• Bar charts 

— The bars are constructed to show the frequency, or relative frequency, 
for each category of the attribute. 

— Usually, the bars are equal in width. 

— It is important that the vertical scale should start at zero; otherwise the 
heights of the bars are not proportional to the frequencies. 

( b ) For quantitative data: 

• Frequency histograms 

— The chosen class intervals should not overlap and should cover the full 
range of the data. 

— The area of each bar (not its height) should be proportional to the 
frequency. Unequal class intervals are taken into account by the area 
of the bars. 

• Frequency polygons (line charts) 

— Constructed by joining the midpoints of the top of each bar. 

— Provide ease of visual comparison between two or more distributions 
drawn on the same chart. 

• Cumulative frequency polygons and cumulative frequency charts 

(ogives) 

— The cumulative frequencies are plotted against the upper tabulated 
limit for each class. 

— In principle, the ogive can be used to estimate, by interpolation, the 
frequency of occurrence of a value of the variable less than or equal to 
a specified value, for example the percentage of a population ^30 years 
of age. 

(c) Others: 

• Maps 

Examples should be given of the misuse of diagrammatic presentation of data: 

— Complexity: presenting too much information on one diagram. 

— Suppression of zero on the vertical scale may lead to misrepresentation of 
the appearance of changes. 

— Choice of scale: stretching or suppressing the scale can mislead readers. 

Labelling of tables and diagrams 

The need for proper (self-explanatory) titles for tables and diagrams should 

be emphasized. Table columns and rows, and axes of diagrams should be 

labelled. 

Advantages and disadvantages 

Tabular presentation has the advantage of displaying the characteristics of the 
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data more clearly than raw data. Diagrammatic presentation gives a better visual 
appreciation of the characteristics than tabular presentation. 

The disadvantage of both tabular and diagrammatic presentations is that, since 
both are based on summarized data, individual values are lost. 
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Bar chart; class; class interval; class limits; class marks; cross-tabulation; cumulative frequency; 
cumulative relative frequency; diagrammatic presentation; frequency; frequency polygon; frequency 
table; histogram; ogive; ordered array; pie chart; relative frequency; tabular presentation. 



Structure of the lesson 

Throughout the lesson, examples should be given to illustrate the various methods of data pres- 
entation. Whenever possible, use of computers for data tabulation and diagrammatic presenta- 
tion should be demonstrated and practised by the students. It would be helpful if a public domain 
general purpose computer software (for example, Epi Info) were used for the demonstrations. 
The teacher should: 

(a) Explain the use of the ordered array for showing distribution patterns in a small set of 
observations. 

(b) Explain the use of the frequency table for showing distribution patterns, and show how 
frequency tabulations can be produced either manually or by computer. In particular, 
mention: 

• the number of classes sufficient to show the distribution patterns; 

• the dependence of class interval or size on the number of classes and range of values; 

• the use of equal or unequal classes (intervals) in relation to distribution patterns; 

• the use of open-ended classes in order to cope with extreme values in a distribution, and 
to ensure that the classification is exhaustive at both ends of the distribution (even if these 
end classes contain no observations); 

• the proper statement of class intervals so that classes are mutually exclusive and 
exhaustive. 

( c ) Explain the use of the relative frequency distribution (for example, percentage distribution) 
for comparing two or more distribution patterns. Note that relative frequencies and relative 
cumulative frequencies in the tables are useful for comparative purposes. Stress that it is 
better to use several simple tables than one complicated table. 

(d) Explain the use of cross-tabulation to obtain the frequency distribution of one variable by 
subsets of another variable (for example, age and sex). 

(e) Describe the use of the cumulative frequency distribution and how it is obtained from a basic 
frequency table. Make special note of the way in which the definition of class marks is 
modified in a table of cumulative frequencies. 

(/) Explain the concept of the distribution pattern of a variable. Give examples from the medical 
literature to illustrate distributions of different shapes (bell-shaped, unimodal, bimodal, 
skewed, etc.), and what these distribution patterns may suggest for disease transmission. 
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(9) Explain how distribution patterns of data are more readily discerned by using diagrams in- 
stead of tabulated data. Describe how the following diagrams should be constructed, either 
manually or using a computer, from data given in frequency tables: frequency histogram, 
frequency polygon, cumulative frequency polygon and cumulative frequency chart (ogive), 
bar chart, pie chart. 



Lesson exercises 

The teacher should organize a set of raw data suitable for tabular and graphic presentation. The 
exercises should give emphasis mostly to the process of data reduction, tabulation and graphic 
presentation, including the uses and interpretation of graphs and other diagrams. 



■ Using data on intra-ocular pressure (in Example 1 in Handout 4.2) indicate 
the type(s) of diagram that would be appropriate to present the data set on each 
of the variables. 



■ Using the data on intra-ocular pressure in Annex A (A.l), draw a frequency 
polygon for one of the variables and produce cross-tabulations for any two 
variables. 



H List four different graphic methods for presenting data from a survey on 
family planning in a village. Illustrate with two variables for each graphic method. 
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HANDOUT 4.1 



Definitions of new terms and concepts 



Bar chart: Diagrammatic presentation of frequency data for nominal classes by bars whose length is proportional to 
the class frequencies. 

Class: One of the intervals into which the entire range of the variable has been divided (for example, each of the 
intervals 3. 0-3. 3, 3. 4-3. 7, . . . , 5. 0-5. 3 is a class). 

Class frequency: The number of observations in each class, also known as the absolute class frequency. 

Class limits: The true values at the beginning and end of each class, which depend on the accuracy of measurement 
(for example, if measurement is accurate to the nearest tenth, then the class limits for the class 3. 0-3. 3 are 
2.95 and 3.34). 

Class marks: The variable values that demarcate each class (for example, 3.0 and 3.3 are, respectively, the lower and 
upper class marks of the class 3. 0-3. 3). 

Classification: The process of subdividing the range of values of a variable into classes or groups. 

Cross-tabulation: A frequency table involving at least two variables that have been cross-classified (tabulated 
against each other). 

Cumulative class frequency: The number of observations up to the end of the particular class. It is obtained by 
cumulating the frequencies of previous classes, including the class in question. 

Frequency polygon: Diagrammatic presentation of the frequency distribution of a quantitative variable, with class 
frequencies plotted against class midpoint marks, the points being joined by straight lines. 

Frequency table or distribution: A tabular arrangement showing the number of times that data with particular 
characteristics occur within a data set. 

Histogram: Diagrammatic presentation of the frequency distribution of a quantitative variable, with areas of rec- 
tangles proportional to the class frequency. 

Ogive: Graph of the cumulative relative frequency distribution. 

Ordered array: Simple rearrangement of the individual observations in order of magnitude. 

Pie chart: Sectors of a circle, with areas proportional to class frequencies, used to present data in nominal classes. 

Relative class frequency: The absolute class frequency expressed as a fraction of the total frequency. 



BEST COPY AVAILABLE 




TEACHING HEALTH STATISTICS ©WORLD HEALTH ORGANIZATION 1999 



35 





HANDOUT 4.2 



Illustrative examples of data presentation 



Example 1 

Extract of data on intra-ocular pressure measurements of 135 adults (for the full data set see Annex A). The data are 
given in mmHg, but may also be expressed in kPa (1 mmHg = 0.133 kPa). 



Age (yrs) 


Sex 


Rt a 


Lt b 


Diff 


Potential for glaucoma 


24 


M 


20 


27 


-7 


high 


52 


M 


18 


12 


6 


high 


26 


M 


16 


13 


3 


low 


71 


F 


14 


14 


0 


normal 


49 


M 


13 


14 


-1 


normal 


39 


M 


21 


16 


5 


high 


71 


M 


14 


12 


2 


low 


32 


F 


13 


12 


1 


normal 


38 


F 


13 


12 


1 


normal 


33 


F 


9 


8 


1 


normal 


a Right eye (mmHg). 
b Left eye (mmHg). 

c Difference between the right and left eye measurements. 

Using seven equal intervals, data on intra-ocular pressure in the right eye may be presented 


1 in a frequency distribution 



table as in Table 4.1. 

Table 4. 1 Frequency distribution table of intra-ocular pressure (right eye) 


Intra-ocular pressure 


Number of 


Relative 


(mmHg) 


observations 


frequency 


0-3 


0 


0 


4-7 


1 


0.7 


8-11 


16 


11.9 


12-15 


63 


46.7 


16-19 


40 


29.6 


20-23 


13 


9.6 


24-27 


2 


1.5 


Total 


135 





Evident features of the distribution of right eye intra-ocular pressure values, among the 135 subjects studied, 
include their variation from 4 to 27 and the fact that an appreciable number of persons have values between 1 2 and 
15. 
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HANDOUT 4.2 (continued) 



Example 2 

In a (hypothetical) study of the relationship between angular stomatitis and occupation, the occupation of each 
individual was recorded as: P (professional), S (skilled) or U (unskilled). There were 88 people with angular stomatitis 
(+) and 100 without the disease (-). A partial list of the disease and occupation characteristics of the individuals 
could be represented as in Table 4.2. 



Table 4.2 Listing of data on angular stomatitis 



Person 


Angular 

stomatitis 


Occupation 


1 


- 


P 


2 


+ 


U 


3 


+ 


U 


4 


- 


s 



186 


+ 


S 


187 


- 


u 


188 


- 


p 



These data can be presented in a cross tabulation of the two variables: angular stomatitis (present/absent) and 
occupation (professional/skilled/unskilled) as in Table 4.3. 

Table 4.3 Distribution of 188 people by occupational classification 
and angular stomatitis 



Angular stomatitis Percentage 



Occupation 


Present 


Absent 


Total 


with disease 


Professional 


5 


20 


25 


20.0 


Skilled 


13 


30 


43 


30.2 


Unskilled 


70 


50 


120 


58.3 


Total 


88 


100 


188 


46.8 



Table 4.3 shows the relative frequencies of angular stomatitis in the various occupational categories. 
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HANDOUT 4.2 (continued) 



Example 3 

The data on right intra-ocular pressure in Example 1 may be presented as a histogram (Figure 4.1). 
Figure 4. / Histogram of right eye intra-ocular pressure data from Example 1 
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Example 4 

The intra-ocular pressure data from Example I can also be presented as a frequency polygon (Figure 4.2). Each data 
point represents the frequency of one class. 

Figure 4.2 Frequency polygon of the right eye intra-ocular pressure data from Example 1 




Example 5 

The occupational distribution of the 1 88 people in Example 2 can be presented as a bar chart (Figure 4.3). 
Figure 4.3 Bar chart of occupation data from Example 2 
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HANDOUT 4.2 (continued) 



Example 6 

A pie chart (Figure 4.4) can also be used to show the distribution pattern of the occupation data in Example 2. The 
frequencies have to be converted, proportionately, to angles. 



Occupation 


Frequency 


Sectoral angle 


Professionals 


25 


A 


Skilled 


43 


B 


Unskilled 


120 


C 


Total 


188 


360° 



A = (25/188) X 360° =48°; 

B = (43/188) X 360° =82°; 

C = (120/188) X 360° = 230°. 



Figure 4.4 Presentation of the occupation data from Example 2 as a pie chart 
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HANDOUT 4.2 (continued) 




Example 7 

Wrong or misleading diagrammatic presentation of data is illustrated in Figures 4.5 to 4.8. 

Figures 4.5 and 4.6 show DPT3 vaccination coverage (diphtheria-pertussis-tetanus vaccine, 3 doses) for infants in 
Nicaragua for the period 1990-1996 (data from Expanded Programme on Immunization, WHO). Figure 4.5 gives a 
misleading impression of the increase in coverage between 1 990 and 1 996. 



Figure 4.5 Inappropriate choice of scale and Figure 4.6 Zero point included on the vertical axis 
missing zero point on the vertical axis 




Figures 4.7 and 4.8 present mortality data from Nicaragua for 1991, as reported in World health statistics annual 
1993. Figure 4.7 shows the number of deaths from infections and parasitic diseases; the data are wrongly presented 
because identical intervals on the horizontal axis have been used to represent different age ranges (1, 4, 5 and 10 
years). Each curve in Figure 4.8, in the form of a frequency polygon, shows the relative frequency of death due to the 
specified cause in 10 different age groups. The graph is overcrowded and hence difficult to read. 
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Relative frequency CTi Wo. of deaths 



Figure 4. 7 Equal intervals on the horizontal axis for unequal data intervals 




lure 4.8 Crowded graph 
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HANDOUT 4.3 



Appropriate use of tabular and diagrammatic 
data presentation 



The following is a summary of appropriate methods of data presentation depending on the situation. 



Tabular methods 



Data situation 


Tabular method 


Small data set of, for example, less than 20 in number 
Individual observations, many in number, involving only one variable 
Individual observations involving two or more variables 


Ordered array 
Frequency table 
Cross-tabulation 


Diagrammatic methods appropriate to tabular data 


Tabular data 


Diagrammatic method 


Frequency table, quantitative variable, one set of data 
Frequency table, quantitative variable, two sets of data 
Frequency table, categorical data 


Histogram or frequency polygon 
Frequency polygon 
Bar or pie chart 
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OUTLINE 5 



Measures of central tendency and location 



Introduction to the lesson 

Sets of measurements cannot be meaningfully and adequately described by the values of all 
the individual measurements. Appropriate summary indices must therefore be obtained. One 
type of index describes the "central" point (for example, an average of the values), or the most 
characteristic value, of the measurements. These are the measures of central tendency and 
location. 

Objective of the lesson 

The objective of the lesson is to define, and discuss, the indices of central tendency and other 
locations (the mean, median, mode and quartiles), their use, interpretation and limitations. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain why summary indices are needed in medicine. 

(b) Compute the mean, median and mode of a given set of data (grouped and ungrouped). 

(c) Compute percentages for categorical data. 

(d) Discuss, with examples, the uses and limitations of the mean, median and mode, and their 
relative advantages and disadvantages as summary indices of health data. 

(e) Explain the use of quartiles and percentiles to summarize health data. 

(/) Select an appropriate measure of central tendency and location for a given application. 

(g) Differentiate between "average", "normal" and "ideal" values, with reference to health data. 

Required previous knowledge 

Contents of Outlines 1 to 4 and, if computers are to be used, basic knowledge of their use. 



Lesson content 

The lesson should cover the definitions, calculation, relative advantages and dis- 
advantages, and appropriate data situations for use of the following: 

Measures of central tendency 

• Arithmetic mean 

• Geometric mean 

• Weighted mean 
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• Median 

• Mode 

Other measures of location 

• Quartiles 

• Percentiles 

• Proportions 

The teacher should be able to construct an outline of this lesson content with 
reference to the material in the proposed handouts. The following are illustra- 
tive examples of the computations of some of these descriptive statistics. 

Examples (such as those given below or in Handout 5.2) should be used through- 
out the lesson. Examples based on real data, and on topics familiar to the 
students, would be preferable. 

Example 1: Arithmetic mean, median, mode, quartiles and 50th percentile 

Prior to an extensive health survey of a region in a certain country, the sampling 
procedure required some knowledge of the household sizes of the survey re- 
gion. A (random) sample of 3 1 households was therefore selected and details of 
the number of residents in each household were as follows: 



5 5 


6 


3 


8 


2 3 1 


6 


2 


3 1 


9 


9 


8 


6 6 8 


8 


10 


5 4 


1 


9 


7 


8 5 1 


1 


1 1 


4 

Calculate the arithmetic 


mean, median, mode. 


quartiles and 50th percentile of 



the number of residents per household. 

Arithmetic mean 

To calculate the arithmetic mean, there are two steps: 

First step: add all values to obtain the total number of people in the 

households: 

5 + 5 + 6 + 3 + ... + 4 = 165. 

Second step: divide this total (165 people) by the number of households (31). 
Thus the arithmetic mean is 165/31 = 5.32 persons per household. 

Median 

To determine the median household size, there are also two steps: 

First step: arrange all values in order of their magnitude (this arrangement is 

called an array): 

1, 1, 1, 1, 1, 2, 2 9, 9, 9, 10, 11. 

Second step: select the value which divides this distribution into two halves (for 
example, the middle observation if the number of observations is 
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odd or the arithmetic mean of the two middle observations if the 
number of observations is even). 

Thus the median value for this distribution is the 16th observation in this array, 
with a value of 5 persons. 

Mode 

This is the observation in an array with the highest frequency of occurrence. 

Thus in the array of the household size data, there are two most frequently 
observed numbers of household residents, each presented by 5 households. These 
are households with single residents and households with 8 residents. There are, 
therefore, two modes, of 1 and 8 people per household. 

The distribution is therefore described as bimodal. If there are more than two 
modes, then the distribution is said to be multimodal. 

Quartiles 

These are the observations in an array that divide the distribution into four equal 
parts. Therefore: 

— the first quartile in the household data is the 8th observation in the array, 
with a value of 3 persons; 

— the second quartile is the 16th observation in the array, with a value of 5 
persons; 

— the third quartile is the 24th observation in the array, with a value of 8 
persons. 

50th percentile 

The percentiles are values in an array that divide the distribution into a hundred 
equal parts. Thus the 50th percentile in the household data is the 16th observa- 
tion in the array, with a value of 5 persons. 

Note that the 50th percentile corresponds to the second quartile and the 
median. 

Example 2: Weighted mean 

The mean ages (in months) of preschool children in different villages are pre- 
sented in Table 5.1 . 



Table 5. 1 Mean ages of preschool children in different 



villages 



Village 


No. of children 


Mean age (months) 


1 


54 


58.6 


2 


52 


59.5 


3 


49 


61.2 


4 


48 


62.5 


5 


48 


64.5 
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Calculate the weighted mean age of preschool children in all the 5 villages. 
Weighted mean 

The mean age of all the preschool children is found in two steps: 

First step: multiply the mean age for each village by the corresponding number 

of children in each village (weights), and add up the totals thus 
obtained: 

(58.6 X 54) + (59.5 X 52) + (61.2 X 49) + (62.5 X 48) + (64.5 X 48 
= 15 353.2 months. 

Second step: divide the total cumulative age of the preschool children (obtained 
in the first step) by the total number of children in the villages: 

15 353.2 months divided by 54 + 52 + 49 + 48 + 48 

Thus the weighted mean age = 15 353.2/251 = 61.17 months. 

The non-weighted mean ignores the fact that the number of children in each 
village is not the same (that is, unequal weights). In the above example, the 
non-weighted mean would be: 

(58.6 + 59.5 + 61.2 + 62.5 + 64. 5)^5 = 61.26 months. 



Array; arithmetic mean; bimodal distributions; measures of central tendency and location; median; 
mode; multimodal distributions; percentiles; quartiles; summary indices; weighted mean. 



Structure of the lesson 

The following sequence may be followed for this lesson. 

(a) Briefly review the relevant areas of data presentation already covered, in particular, 
frequency distributions, grouping of data and class intervals, and modality of frequency 
distribution. 

(b) Explain the need for numerical summary indices. Simple examples on the uses of the indices 
of central tendency in the health field should be given throughout the lesson. Whenever 
"normal" values are referred to in medicine, these values are indices (bench-marks) of the 
variables in question (for example, "normal" temperature, "normal" weight for age, etc.). 
Discuss the meaning of the terms ideal, common, optimum, typical and usual in relation to 
the mean, median, and mode. For example, a value of the weights of a group of people (who 
may be poorly nourished) may be the average weight for the group, yet it may not be nor- 
mal. Similarly, a normal value may not be ideal for healthy and productive living. 

(c) Explain the limitations of the indices of central tendency and location, using health-related 
illustrative examples. Lack of sensitivity of the median and mode to some aspects of data 
distributions should be pointed out, together with their usefulness in describing skewed 
distributions. The effect of outliers to the mean should also be explained. Mention the 
need for other measures of location, such as quartiles or percentiles, in situations when the 
distribution is asymmetric or skewed. 
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(d) Illustrate the use of percentiles for child growth monitoring, using the standard growth- 
monitoring chart. 

(e) When discussing the computation of the indices, emphasize the underlying principles rather 
than the need to memorize formulae. The role of computers as facilitators of the computa- 
tions of these indices should be pointed out to the students. While use of computers should 
be encouraged, the processes of computing the mean, median and mode should be explained 
in detail. 

(/) Procedural differences for grouped and non-grouped data should be pointed out. Explain 
that for non-grouped data, the indices represent directly the sets of data they refer to, whereas 
if indices are computed from grouped data, they are only approximations. The effect of the 
level of accuracy of the data on estimated means and median (based on grouped data) should, 
therefore, be discussed. 

Determination of class mid-values, and assumptions concerning open-ended intervals, should 
be clearly explained. 

Handout 5.1, giving definitions of the new terms and concepts introduced in this lesson, should 

preferably be given to the students before the lesson. 



Lesson exercises 

The exercises should emphasize the correct choice of the various measures of location and their 
interpretation. The teacher should, therefore, provide appropriate data to demonstrate the dif- 
ferent types of distributions and for the calculation of indices. 



■ The following are data of daily attendance at a health centre for the month of 
November 1992. The health centre is open all day for outpatient services from 
Monday to Friday and for mornings only on Saturdays, and is closed on Sundays 
except for emergencies. 



Table 5.2 Data on the daily attendance at a health centre recorded for the month of 
November 1992 



Date & day 


No. of 
patients 


Date & day 


No. of 
patients 


Date & day 


No. of 
patients 


1 Sunday 


24 


1 1 Wednesday 


50 


21 Saturday 


47 


2 Monday 


75 


12 Thursday 


80 


22 Sunday 


35 


3 Tuesday 


100 


13 Friday 


96 


23 Monday 


84 


4 Wednesday 


112 


14 Saturday 


58 


24 Tuesday 


90 


5 Thursday 


77 


15 Sunday 


22 


25 Wednesday 


87 


6 Friday 


74 


16 Monday 


98 


26 Thursday 


91 


7 Saturday 


50 


17 Tuesday 


76 


27 Friday 


86 


8 Sunday 


38 


18 Wednesday 


82 


28 Saturday 


49 


9 Monday 


103 


19 Thursday 


69 


29 Sunday 


30 


10 Tuesday 


110 


20 Friday 


79 


30 Monday 


94 



(a) Calculate the mean, median and mode for daily attendance. 

(b) Comment on the distribution on the basis of the values obtained in (a). 
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(c) Using a class-interval of 5, construct a frequency table of the daily 
attendance. 

(d) From the table obtained in (c), calculate the mean and median. 

(e) Compare and comment on the values of mean and median obtained in (a) 
and (d). 

(/) Calculate the first quartile and the 75th percentile for the attendance. 

( 9 ) By which day of the month had the clinic seen 50% of the patient-load? 
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Definitions of new terms and concepts 



Arithmetic mean: The sum of all values of a set of observations divided by the number of observations. 

Geometric mean: A mean derived by multiplying together the n individual values in a series of observations and 
calculating the n ,h root. The logarithm of the geometric mean is thus the arithmetic mean of the logarithm of 
individual values. 

Measures of central tendency and location: Summary indices describing the "central" point, or the most 
characteristic value, of a set of measurements. 

Median: Value that divides a distribution into two equal halves; central or middle value of a series of observations 
when the observed values are listed in order of magnitude. 

Mode: The most frequently occurring value in a series of observations. 

Multimodal distributions: Data distributions with more than one mode. Distributions with two modes are 

bimodal. 

Percentiles: Those values in a series of observations, arranged in ascending order of magnitude, which divide the 
distribution into 1 00 equal parts (thus the median is the 50th percentile). 

Quartiles: The values which divide a series of observations, arranged in ascending order, into four equal parts. Thus 
the second quartile is the median. 

Summary indices: Values summarizing a set of observations. 

Weighted mean: A mean for which individual values in the set are weighted, very often by their respective 
frequencies. 
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HANDOUT 5.2 



Examples of computation of mean and median for 
grouped data 



Table 53 Systolic blood pressure for 240 meif 



Systolic blood 
pressure in mmHg 
(class interval) 


Frequency 

(O 


Mid-value of 
the class 

M 


Product 

m 


Cumulative 

frequency 


Under 100 


4 


95 b 


380 


4 


100- 


16 


105 


1680 


20 


110- 


18 


115 


2070 


38 


120- 


40 


125 


5000 


78 


130- 


66 


135 


8910 


144 


140- 


56 


145 


8120 


200 


1 50— 


34 


155 


5270 


234 


160- and over 


6 


165 b 


990 


240 


Total 


240 


— 


32420 


— 



a Data are given in mmHg, but may also be expressed in kPa (1 mmHg = 0. 1 33 kPa). 
b These are assumed mid-values, with the lowest value being 90 and the highest value 170. 



Mean 

The approximate mean is the weighted average of the class mid-values: 
i.e. 32 420/240 = 135.1 mmHg. 

Median 

The median blood pressure lies in the interval between 130 and 140 mmHg. It is the average of the 1 20th and 1 21st 
observations. Their estimated values are respectively: 

130 + (l 20 - 78) X 10/66 = 136.36 

and 130 + (121 - 78) x 10/66 = 136.52 

Therefore the median is 136.4 mmHg. 
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OUTLINE 6 



Measures of variability 



Introduction to the lesson 

Knowledge of a single summary figure (such as any of the measures of central tendency dis- 
cussed in Outline 5), for describing the characteristics of a population, is not enough without a 
measure of the extent of variability or spread of the measurements around this summary index. 
Health workers often have to decide whether to classify an individual as healthy or sick, suffer- 
ing from a particular disease or not, needing treatment or not, etc. For this task, the so-called 
"normal" values of certain clinical, laboratory or radiological measurements provide the neces- 
sary yardstick. But the word "normal" value is a statistical concept and depends, to a great 
extent, on the distribution of the classifying attribute in the population. Measures of spread 
or dispersion or variability are, therefore, essential for understanding, using and interpreting 
this concept of "normal" values, and for a complete description of a given health data set. 
No description of any health data by summary indices is complete without the measures of 
variability. 

Objective of the lesson 

The objective of this lesson is to define and discuss the sources of variation in health data, and 
the various measures of variability, their use, interpretation and limitations. 

Enabling objectives 

At the end of this lesson, the students should be able to: 

(a) Explain the meaning of a measure of variability or dispersion and its place in descriptive 
statistics. 

(b) Explain the uses of the terms: range, inter-quartile range, variance, standard deviation and 
coefficient of variation, as measures of variability of health data. 

(c) Compute the following, given either grouped or non-grouped data, with the aid of reference 
material: 

• range; 

• inter-quartile range; 

• variance; 

• standard deviation; 

• coefficient of variation. 

( d ) Describe the relative advantages and disadvantages of the five indices listed above. 

(e) Select an appropriate measure of variability for a given data situation. 

(/) Discuss the concept of normality of health data in terms of mean, standard deviation and 
percentiles. 
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Required previous knowledge 

All materials in the previous lessons in this series, particularly the meaning and interpretation of 
measures of central tendency, and other methods for data reduction and presentation, including 
patterns of distributions. 



Lesson content 

Need for measures of variability 

(a) Inherent biological variations, as well as variations from a number of 
other sources, that lead to systematic or non-random variations in health 
measurements. 

(b) The concept of summarizing variability in a single number, in order to facili- 
tate comparison of variability between different groups. 

(c) Uses of "normal values" and "normal range" in medical practice. Examples: 
systolic and diastolic blood pressure, pulse rate, heart rate, height, weight, 
serum cholesterol, haemoglobin levels. 

(d) The concept of using variability as an indicator of homogeneity or hetero- 
geneity of data. 

Measures of variability (for definitions see Handout 6.1) 

• Range 

0 Standard deviation 

• Variance 

• Coefficient of variation 

Advantages, disadvantages, properties and uses of the measures of variation 

Range 

— Simple to calculate. 

— Easy to understand. 

— Extreme values are dependent on sample size. 

— Not based on all observations, that is, takes no account of the variability of 
observations between the two extreme values. 

— Not readily amenable to further mathematical treatment. 

— Should be used in conjunction with other measures of variability; otherwise 
full frequency distributions, means, etc. should be given. 

Variance , standard deviation, standard error 

— Based on all observations. 

— Deviations are taken from mean, that is, the measure of central tendency. 

— Most widely used because of the properties of the theoretical normal curve, 
and because of the importance of variance in inductive statistics (see Outline 
7). 

— For standard deviation and standard error, the unit is same as the mean. 
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Coefficient of variation 

— Used for the comparison of relative variability of two distributions. 

— Measures level of variability in the data relative to the average value. 

— It is independent of any unit of measurement, and thus useful for compari- 
son of variability in two distributions having variables expressed in different 
units (for example, height expressed in centimetres for one distribution and 
weight in kilograms for the other). 

— Takes into account each value of the distribution. 

Establishing "normal" values for health data 

The establishment of "normal" values permits the selection of appropriate ac- 
tions in medical practice. 

Variability is inherent in all biomedical measurements upon which decisions on 
individual patient care or community health programmes are based. It is there- 
fore necessary to have established standards on which decisions can be based. 
These standards are often referred to as "normal values", and are generally based 
on measurements made on population groups categorized as "healthy". In 
statistical reasoning, what occurs most frequently is considered as "normal" 
and the problem is often where to draw a cut-off line between "normal" and 
"abnormal". 

Two types of "normal" values are usually required for medical decisions: the 
"point normal" values and the "normal ranges". Point normal values are esti- 
mated by measures of central tendency (refer to Handout 5.1 for definitions of 
measures of central tendency and location). Normal ranges give the general level 
(in terms of an interval) of a characteristic for healthy population groups. Some 
people in the population will have exceptionally high or low values of a particu- 
lar characteristic and yet apparently be perfectly healthy. These are called 
"outliers". Such exceptional values cannot be regarded as typical of the popula- 
tion group. Hence sometimes a few very extreme measurements are excluded 
from the computation of normal values. 

Most biomedical normal ranges have been adopted to ensure that 95% of 
randomly selected healthy people would fall within the limits. Where a vari- 
able follows a unimodal and symmetrical distribution, it is easy to compute 
the normal range in terms of the mean and standard deviation (SD), by using 
the properties of the theoretical normal distribution. For example, for the nor- 
mal distribution, the range, mean plus or minus 1 SD, covers approximately 
68%, and mean plus or minus 1.96 SD includes approximately 95%, of the 
sampled population. For multimodal or asymmetrical distributions, the compu- 
tation of the normal range can be quite involved, although the same principles 
apply. 

Very often, normal values differ between geographical areas or between sexes 
or age groups. For example, "normal" blood pressure differs between sexes, 
and also varies with age, and its pattern is not the same in all human populations. 
A statement of normal values must therefore indicate the population referred 
to. 
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Coefficient of variation; dispersion; normal values; range; standard deviation; standard error; theo- 
retical normal (Gaussian) distribution; variance. 



Structure of the lesson 

The lesson may proceed in the following sequence. 

(a) Recapitulate the various sources of variation as presented in Outline 1, and illustrate their 
cumulative effect on the validity and reliability of measurements in health data. Distinguish 
between random and systematic variations. 

(b) Describe the nature of measures of variability or dispersion and their place in descriptive 
statistics. Differentiate between a summary index of central tendency and a summary index 
of dispersion, and explain their complementary roles in the study of any characteristic among 
a group of subjects (for example, as indicators of homogeneity and heterogeneity), and for 
comparison between different groups of subjects. Explain how variability may or may not be 
related to the magnitude of the variable, and hence differentiate between indices of absolute 
dispersion and of relative dispersion. 

(c) Give the definitions and methods of computation of the different summary indices of abso- 
lute dispersion commonly encountered in the literature. These should include: 

• index based on distributional positions (range); 

• indices summarizing the squares of differences of individual values from the mean (sum 
of squares, variance or mean square, standard deviation); 

• handling open-ended intervals for grouped data. 

(d) Draw attention to the concept of a range of "normal" values, often determined arbitrarily as 
the interval spanning the central 95% of values in the frequency distribution (that is, the 
range from the 2.5 percentile to the 97.5 percentile), and explain how the standard devia- 
tion is often used to estimate this normal range in the form x ± 1.96 SD. 

Iv \ 

(e) Give special attention to this use of the standard deviation, which derives from the proper- 
ties of the theoretical normal distribution or normal curve. Explain the concept of the stand- 
ard normal deviate z (distance from the mean expressed in standard deviation units), and 
illustrate how percentiles of the normal distribution are related to values of z. Mention how 
the proportions of the normal distribution that lie within or outside various multiples of z 
below or above the mean (for example, x ± SE, x ± 1.96 SE) can be used to determine the 
"normal" range of values. Discuss when and why the standard deviation may or may not be 
used in this way for empirical data (observed frequency distributions). 

(/) Summarize the uses and limitations of the different measures of variability or dispersion. 



Lesson exercises 

The teacher should obtain data that can demonstrate variation in an attribute, such as a continu- 
ous variable, and ask the students to calculate the various measures of variation and to describe 
how they can compare variation in variables measured in different units. 
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■ The following data show duration of illness (in days) for 23 cases of 
pneumonia: 

8, 10, 11,11,11, 8, 10, 10, 10, 12, 12, 14, 14, 13, 15, 17, 18, 6, 5, 4 

Calculate the range, variance, standard deviation, and coefficient of variation 
for the above data. 

Comment, with reference to the above data, on the advantages and disadvan- 
tages of the variance, standard deviation and coefficient of variation, as meas- 
ures of variation. 



■ Table 6.1 presents data on annual income, in US$, in 300 households, given 
in the form of a frequency distribution. 



Table 6. 7 Distribution of household annual income 



Household annual 
income (US$) 


Frequency 


Under 100 


2 


100- 


4 


200- 


9 


300- 


10 


400- 


22 


en 

o 

0 

1 


68 


600- 


85 


700- 


58 


800- 


25 


900- 


8 


1000- 


6 


1100- 


2 


1200 and over 


1 



Calculate the variance, standard deviation and coefficient of variation. 
Comment on the distribution of household income. 

Calculate within what income range the central 95% of the household annual 
income is likely to fall. 
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Definitions of new terms and concepts 



Coefficient of variation: Standard deviation expressed as a percentage of the mean (this is independent of the 
scale/unit of measurement). 

Normal (or Gaussian) distribution: The continuous frequency distribution of infinite range with the following 
properties: it is bell-shaped; its mean, median and mode are identical; and it is completely defined by the 
mean and standard deviation. 

Normal values: Values regarded as being within the usual range of variation in a given population or population 
subgroup. The range of such values is called the normal range. 

Range: Difference between largest and smallest value in a series of observations. Calculated by: R = x max - x min , 
where x max and x min denote the largest and smallest values, respectively, in a series of observations. 

Standard deviation (SD): Root mean square deviation, where deviations have been taken from the mean. This 
equals the square root of the variance, expressed in the units of the original observations. 

Standard error (SE): Standard deviation of a statistic. For example, the standard error of the mean is the standard 
deviation divided by the square root of the sample size. 

Variance: Sum of squared deviations, taken from the mean, divided by the number of observations n(or n - 1 for an 
unbiased sample variance), expressed in squares of the unit of the original observations. 
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OUTLINE 7 



Introduction to probability and 
probability distributions 



Introduction to the lesson 

Health workers, particularly clinicians and health managers, often need to take specific 
decisions on individuals or communities based on available data. However, outcomes and 
responses to treatments or situations can rarely be predicted with certainty. Medicine, as 
an inexact science, is "probabilistic" as opposed to being "deterministic". Mathematical 
models are therefore often used to describe observed situations in the health field. Prob- 
ability theory allows us to quantify the degree of "uncertainty" in our deductions. Hence 
the theory of probability underlies the methods for drawing statistical inferences in 
medicine. 



Objective of the lesson 

The objective of this lesson is to provide an understanding of the basic concepts of probability, 
that is sufficient to serve as background for the subsequent development of its uses in the inter- 
pretation of the results of medical studies and in decision-making. 



Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain the meaning and definitions of technical terms used in the study of probabilities and 
probability distributions. 

(b) Explain the terms: mutually exclusive, independent and dependent events. 

(c) Explain and illustrate the operation of the addition and multiplication laws of probability, 
and their elementary uses in medicine. 

(d) Differentiate between discrete and continuous probability distributions. 

{e) Describe the data situations that lead to a binomial distribution. 

if) Describe the normal distribution and its properties. 

(g) Apply values in probability tables (at this stage, tables of normal and binomial prob- 
abilities) to solve simple health problems (for example, setting up a "normal" range of 
values). 



Required previous knowledge 

Presentation of absolute and relative frequencies and distributions, frequency histograms and 
polygons, and curves representing distributional patterns. 

Common summary indices used in descriptive statistics, including proportions, percentages, 
percentiles, means and standard deviations. 
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Lesson content 

Concept of probability 

• Definition of probability (subjective definition and the frequency concept). 

• Definitions of technical terms (trials, experiments, outcomes, events, chance, 
odds; see Handout 7.1). 

• Scale of measurement of probability and its interpretations. 

Laws of probability 

• Explanations of simple and compound events. 

• Mutually exclusive and independent events. 

• The addition and multiplication rules. 

• Dependent events and definition of conditional probability. 

Probability distributions 

• Discrete probability distributions (binomial). 

• Continuous probability distributions (normal). 

• Properties and uses of the distributions. 



W 
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Additive law of probability; binomial coefficients; binomial distribution; conditional probability; 
dichotomous independent events; inductive statistics; multiplication law of probability; mutually 
exclusive events; population; probability; probability distribution; relative frequency; statistical 
inference. 



Structure of the lesson 

The students do not need to know probability theory in detail, but should be acquainted with 

some of its basic concepts, principles, rules and applications. 

The lesson content may be presented in the following sequence. 

(a) Introduce the idea of subjective probability in everyday life, which is not usually quantified. 

(b) Explain the range of values for probabilities (0-1) and the interchangeable use of the terms 
"chance" and "probability". 

( c ) Briefly review the uses of the descriptive statistical methods already learned, and introduce 
the concept and meaning of inductive statistics, illustrating with medical data (for example, 
how criteria of abnormality used in diagnosis are based on descriptive information but 
applied to new patients). 

(d) Explain the meaning of such terms as trials, outcomes, events, experiments. 

(e) Explain the relationship between probability and observed proportions in data on a dicho- 
tomous attribute. 

Example: what is the chance of finding a person with Type A blood, or the chance of an 
unborn child being male? 
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Explain the concepts of independent and of mutually exclusive events, and illustrate how 
the additive and multiplication laws of probability operate. 

Example: if two diseases are "independent" what is the probability of finding a person with 
both diseases? What is the chance that a patient with fever has malaria or typhoid? (See 
Handout 7.1 for a definition of independent events.) 

Introduce the concept of a dichotomous population and the data situation for the binomial 
probability distribution of a discrete random variable. 

With reference to dichotomous medical data, discuss the possible outcomes, and their 
probabilities, in a small sample of n observations. 

Example: if treatment of a given disease is effective in 70% of cases, and you have treated 5 
cases of this disease (that is, p = 0.7 and n = 5), what are the chances that the treatment was 
effective in none of them, one ot them, two of them, etc.? (Refer to the worked example in 
Handout 7.2.) 

From the table of computed values of probabilities (Table 7.1, Handout 7.2) for each of the 
possible outcomes, obtain the cumulative probabilities (upwards and downwards), and give 
the probabilities of finding less than or more than a specified number of "successes". 

Example: from the data in Table 7.1, Handout 7.2, what are the chances of having more than 
3 treatment successes among 5 patients? 

Explain the derivation of these "binomial probabilities" using the binomial equation with 
reference to the material in Handouts 7.1 and 7.2. 

Give other medical examples involving dichotomous attributes to which the binomial 
probability distribution can be applied. (Prepare worked examples for other values of n and 
p if needed.) 

{f) Explain the general concept of continuous probability distributions and cumulative 
probability distributions. Review the properties of the normal distribution and discuss the 
standardized normal distribution (mean = 0, variance = 1). Using a table of "areas under 
the normal curve" (refer to Annex B, table B.l), explain how to determine the probability 
of finding values less than or greater than various specified values of z, with reference to 
medical examples. (Even if this has previously been done, for example, after the lessons on 
the mean and the standard deviation, it should be recalled at this stage to reinforce and 
demonstrate the application. Link this to the setting up of a normal range of values.) 

(3) It is recommended that three handouts be given to the students for use during the lesson: 

• Handout 7.1, covering probability and probability distributions, two basic rules of prob- 
ability, and the binomial probability distribution. 

• Handout 7.2, giving worked examples of binomial probability distributions: 
for n = 3 and p = 0.7 for reference during the lesson; 

for n = 10 and p = 0.3 (or other values of n and p) for reference during practical work. 

• Table B.l in Annex B, presenting a table of areas under the normal curve. 

Lesson exercises 

The teacher should prepare exercises that test the students' ability to apply the frequency dis- 
tribution definitions to calculate probability of events; and demonstrate the application of the 
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laws of probability and the binomial probability distribution with particular reference to health 
problems. 

To demonstrate and reinforce the concept of a sampling distribution and to show how it is 
governed by laws of probability, let the students generate an empirical (observed) sampling 
distribution and compare it with the theoretical (expected binomial) distribution. One way of 
doing this is to use coloured beads to represent persons with different attributes in a population. 
Give examples of dichotomous medical attributes that can be represented by beads of two 
colours, say black and white, for example, genetic traits (sickle cell anaemia, blood grouping, 
etc.). 



■ From a box containing a large number of beads of two colours (for example, 
black and white), let each student take a random sample of a given size n (for 
example 5); tabulate the number of black (or white) beads seen in each sample. 
This gives an observed sampling distribution. 

Given the actual proportion of black (or white) beads in the box, calculate the 
binomial probability distribution for sample size n . Hence calculate the expected 
sampling distribution for the observed number of samples. 

Compare and comment on the observed and expected sampling distributions. 



The goodness-of-fit will be tested later when the students have learned about the chi-squared 
test. 

The following exercise is designed to demonstrate the application of the laws of probability and 
the binomial probability distribution. 



■ Give data on prevalence of various attributes (for example, diseases) in a 
population, and ask questions on the probability of a person having or not 
having various combinations, or all, or none, of these attributes. 

Ask questions about the expected number of 2- or 3-child families that have 
various numbers of sons or daughters. 

Give an exercise to show the role of probability statements in the context of a 
diagnostic test, for example, concerning its specificity, sensitivity or predictive 
value. 



ID Supposing that a midwife delivering babies at a maternity home does 
not know the sex of the baby until it is delivered, how can one determine 
the probability of delivering a male baby, using data from the maternity 
home? 



■ For a couple desiring to have a baby boy following three female births, if it is 
known that the chance of a pregnancy resulting in a male baby is 0.5, what is 
the chance that the fourth pregnancy will result in a male birth? 



■ If in a hospital, over a period of three years, 27 out of 30 babies with neonatal 
tetanus have died, what is the probability that a neonate with tetanus will sur- 
vive if there has not been any change in how tetanus is managed? 
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HANDOUT 7.1 



Definitions of new terms and concepts 



Concepts in probability and inductive statistics 

Conditional probability: The chance of a particular event happening depends on the outcome of some other event. 
P( B/A) = the probability that event B occurs, given that event A has already occurred. 

Descriptive statistics: Statistical methods that deal with description of characteristic(s) about a finite study group. 

Dichotomous attribute: A characteristic classified into only two categories, usually the presence or absence of a 
defined condition (for example, sick or not sick; improved or not improved). Some characteristics are inher- 
ently dichotomous by nature (for example, male/female, alive/dead), but all characteristics, whether or not 
inherently so, can be "dichotomized" by defining and identifying one subgroup and putting all other observa- 
tions into a second (residual) subgroup. 

Dichotomy: Division into two mutually exclusive subclasses. 

Event: One of the outcomes of a trial or an experiment. 

Simple event: Event that cannot be broken down into any other components. 

Compound event: Consists of at least two events. 

Independent events: Two events are said to be independent if the presence or absence of one does not 
alter the chances of the other being present, or if the occurrence of one does not alter the chance of 
occurrence of the other. 

Mutually exclusive events: Events that cannot occur simultaneously or be present at the same time. 
Odds: The ratio of the probability and its complement (i.e. pl( 1 - p)). 

Outcome: The results of a trial (experiment). 

Probability: An event's long-run relative frequency in repeated trials under similar conditions. 

Scale of probability measurement: Probability can be measured on a continuous scale of values between 0 
and 1 (inclusive). An event that is impossible is said to have a probability of occurrence of 0, and an event 
that is certain to occur has a probability of occurrence equal to 1. An event with a probability greater 
than 0.5 is more likely to occur than not. The notation P( A) represents the probability of occurrence of the 
event A. 

Trials: Experiments in which results cannot be predicted in advance. 



Two basic rules of probability 

Addition rule: If an event is satisfied by any one of a group of mutually exclusive outcomes, the probability of the 
event is the sum of the probabilities of the outcomes in the group, that is, 

P { A or B) = P( A) + P(B) 
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HANDOUT 7.1 (continued) 



Multiplication rule: In a series of independent trials, the probability that each of a specified series of events 
happens is the product of the probabilities of the individual events, that is, 

p(a and B) = /’(a)-P(b) 

Binomial distribution 

The binomial distribution is formed by the terms of the expansion of the binomial expression: 

(p+q)\ 

where o = sample size, p = the probability of a "success", p = the probability of a "failure", and p + q = 1 . 
Examples: 

When n = 2, the terms of the expansion of (p+p) 2 are p 2 , 2ppand p 2 . 

When o = 4, the terms of the expansion of (p+p) 4 are p 4 , 4p 3 p, 6p 2 p 2 , 4 pp 3 , and p 4 . 
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HANDOUT 7.2 



Worked example of a binomial 
probability distribution 



The binomial equation for the probability of specified outcomes of an event 

The probability of observing r successes in n independent repeated trials, given that the probability of a success 
in each trial (p) and the probability of a failure (1 -p) is the same from each trial, can be obtained from the equation: 

p{r successes) = [rtj/^/7- rljj • p r • (l - pj ' 

where n\ is n factorial = n(n - ])(n - 2) . . . (2)(1) 
and 0! = 1 by definition. 

At the height of the drought in a given region, it was estimated that 70% of the children under 10 years old were 
severely malnourished. If five children, under 10 years old, were selected at random from the region, what is the 
probability that: all, 4, 3, 2, 1, 0, are severely malnourished? 



Table 7. 1 Binomial probabilities 



Number 

malnourished 


Terms of binomial 
expansion 


Probability 


5 (all) 


p 5 


0.16807 


4 


V<7 


0.36015 


3 


iopV 


0.30870 


2 


iopV 


0.13230 


1 


5p<7 4 


0.02835 


0 (none) 


7 5 


0.00243 



In this table we have: n = 5; p= 0.7; q - 0.3. 

Applying the binomial equation: 

• When all children are malnourished, then r = 5; 

hence P(r = 5) = [ 5/(5 - 5) 5ll x 0.7 s x (l - 0.7) 5 ’ 5 

= 0.7 5 
= 0.16807. 

• When 3 children are malnourished, then r = 3; 

hence P(r= 3) = [ 5/(5 - 3) 3l] x 0.7 3 x (l - 0.7) 5 ” 3 

= 5 x 2 x 0.7 3 X 0.3 2 
= 0.30870. 
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OUTLINE 8 



Sampling and estimating 
population values 



Introduction to the lesson 

Whenever we infer the characteristics of other persons in the population at large from those of 
a finite group which we have studied, we are making use of information about samples to draw 
conclusions or make inductive inferences. Such information has some limitations regarding re- 
liability, precision and validity. Information based on limited samples constitutes most, if not all, 
of the medical knowledge that we have of human populations. 

Objective of the lesson 

The objective of this lesson is to enable the students to understand the concepts of population, 
samples, sampling methods, sampling errors and estimation problems, and drawing inferences 
on the basis of probability. 



Enabling objectives 

At the end of the lesson the students should be able to: 

(a) State the reasons for sampling with the different sampling methods. 

(b) Distinguish between probability and non-probability sampling. 

(c) Differentiate between sampling and non-sampling errors. 

(d) Differentiate between statistics and parameters. 

(e) List possible advantages and disadvantages of collecting health information through 
samples. 

(/) Discuss the relative advantages and disadvantages of each of the following sampling meth- 
ods, as applied to the design of a health survey: 

• probability (random) sample; 

• simple random sample; 

• stratified random sample; 

• systematic sample; 

• cluster sample; 

• multistage sample. 

(g) Calculate the standard error of the sample mean or proportion, given the relevant data and 
formulae. 

(h) Differentiate between point and interval estimates of health indices. 

(/) Explain the concept of the central limit theorem. 

0‘) Explain the meaning and application of confidence limits of an estimate of health indices. 
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(k) Explain how sampling error is related to sample size and to variability of the characteristic 
under study. 

(/) State the information needed to estimate the minimum sample size for a health survey. 

Required previous knowledge 

Concept of probability, measures of central tendency and variability. 

Lesson content 

The concept of sampling 

• Population (universe) 

• Sample 

• Sampling 

• Reasons for sampling 

• Sampling unit 

• Sampling frame 

• Sampling fraction 

• Unit of inquiry 

• Probability and non-probability sampling 

Sampling and non-sampling errors 

• Sampling variation 

• Concept of bias 

• Methods for minimizing sampling errors 

• Methods for controlling non-sampling errors 

Sampling distributions 

• Meaning of parameters and statistics 

• The central limit theorem 

Advantages and disadvantages of using sampling methods to collect health data 

Sampling methods and their advantages and disadvantages 

• Simple random 

• Stratified random 

• Systematic 

• Cluster 

• Multistage 

Estimation 

• Concept of standard error 

• Point and interval estimation (mean and proportion) 
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° Precision 

0 Determination of minimum sample size 



' 

Bias and selection in sampling; cluster sampling; confidence limits; confidence range; difference 
between sampling and non-sampling error; dummy tables; estimation of a population mean; esti- 
mation of a population proportion; health survey; level of confidence; method of sampling; multistage 
sampling; point and interval estimates; population (universe); population parameter; precision 
of estimates; pre-coded data; probability sampling; quality of sample; representativeness of a 
sample; sample statistic; sampling error; sampling fraction; sampling frame; sampling unit; self- 
coding record forms; self-selected or natural samples; standard error; statistical estimation; strati- 
fied random sampling; survey questionnaire; systematic sampling; validity of estimates; unit of 
inquiry. 



Structure of the lesson 

The lesson content may be presented in the following sequence: 

Concept of sampling 

Explain the concept of population, sample and sampling, giving the reasons for sampling: lim- 
ited resources available for estimation, lack of access to total population, or sampling may be the 
only feasible method of collecting the information. 

Also explain the following terms: sampling frame, sampling unit, sampling fraction. 

Explain the characteristics of a good sample (the sample must be selected at random to reduce 
bias, be representative to improve validity, and be large enough to increase precision). 

Distinguish between random sampling and non-random (purposive) sampling. 

Sampling and non-sampling errors 

Explain sample statistics and population parameters. 

Describe the use of sample statistics as estimates of population parameters. Owing to chance, 
different samples give different results, a phenomenon called sampling variation. Sampling er- 
rors and the concept of bias must be explained. 

Explain the concept of sampling error: the unavoidable difference between the value of a sample 
statistic and the corresponding population parameter. An increase in sample size results in a 
reduction in the sampling error. Non-sampling errors are systematic errors during estimation. 
Give examples of non-sampling errors and how they can be reduced. 

Sampling distributions 

Explain the concept of a sampling distribution using simple examples, without invoking mathe- 
matical statistics. Explain the differences between a parameter and a statistic, and that every 
sample statistic belongs to a sampling distribution. 

Describe the principles and applications of the central limit theorem which states that, for all 
variables, whether normally distributed or not, the sample mean will tend to be normally 
distributed. 



OUTLINE 8 SAMPLING AND ESTIMATING POPULATION VALUES 



69 



Advantages and disadvantages of sampling 

Give examples to demonstrate that, by using a well chosen and reasonably large sample, the 
estimates will be close to the expected values and the sample will cover the study population 
adequately. The advantages and disadvantages set out in Handout 8.2 should also be discussed. 

Methods of sampling, their advantages and disadvantages 

The various methods of sampling must be explained and their relative advantages and disadvan- 
tages should be discussed (see Handout 8.2). 

Simple random sampling is useful when the sampling frame is well described and not too large. 
Systematic sampling requires a list of the sampling units or that sampling units are in an ordered 
sequence. Cluster sampling is useful when sampling units form logical groupings and when the 
sample frame is difficult to obtain. 

Estimation 

Explain the concepts of statistical estimation. The following should be covered: 

• The context of statistical estimation: the need to estimate population parameters from sample 
statistics; the problem posed by sampling error for making reliable estimates; the concepts of 
point and interval estimation; validity and precision of a statistical estimate. 

• The concepts of confidence limits and level of confidence: the connection between sampling 
distributions, confidence limits and levels of confidence. Interval estimation: estimation of a 
population parameter in terms of an interval that has a specified probability of containing the 
true value. The interval is the confidence interval, and the limits are the confidence limits. 

• The estimation of normal anthropometric values for a population, with examples; estimation 
of mean birth weight from hospital births; and estimation of disease prevalence in morbidity 
surveys. 

Determination of minimum sample size 1 

Discuss the various factors that determine minimum sample size. Describe the use of the equa- 
tions given in Handout 8.3 to estimate the sample size required to achieve a desired precision for 
an estimate, expressed in terms of a confidence range. The following should also be covered: 

• The relation between confidence limits and standard error: the meaning of standard error; the 
basis for derivation and distribution of confidence limits in terms of the standard error. 

• Computation of the standard error of the mean and proportion, with examples from the 
literature to illustrate their computation and use in statistical estimation. 



Lesson exercises 

The exercises for this lesson should focus on helping the students crystallize the concepts of 
sampling and estimation of population values covered in the lesson. The emphasis should not be 
on correct memorization of formulae but on their appropriate use and interpretation of the 
results. The exercises should, in particular, cover all the major points indicated in the enabling 
objectives of the lesson (reasons for sampling, the advantages and disadvantages of the different 
sampling methods, interpretation of confidence interval, etc.). 



See Lwanga SK, Lemeshow S. Sample size determination in health studies: a practical manual, Geneva, World Health 
Organization, 1991. 
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■ What sampling method would you recommend in the following instances? 

• Determining the proportion of undernourished five-year-olds in a village. 

• Investigating nutritional status of preschool children. 

• Selecting maternity records for the study of previous abortions or duration of 
postnatal stay. 



■ In the estimation of immunization coverage in a province, data on seven 
children aged 12-23 months in 30 clusters are used to determine the proportion 
of fully immunized children in the province. 

• Give three reasons why the cluster sampling method is used in such a survey. 

• Give two sources of systematic error and two sources of random error that 
may be associated with an immunization coverage survey. 

• In the immunization coverage survey, the 30 villages are selected by system- 
atic sampling. If the investigator uses 30 clusters that are easy to reach, what 
is the type of sampling method in this case, and what possible sampling errors 
would be associated with the method? 



■ If the results obtained from an immunization survey indicate that 48% of 
the children were fully immunized (standard error 3%), calculate the 93% in- 
terval estimate of fully immunized children in the study. 



■ In a family planning clinic, there are 2300 clients. Suppose the anticipated 
prevalence of HIV infection is 3% and the investigator is willing to accept an 
absolute error of 1 %. What is the minimum sample size required to estimate the 
prevalence of HIV with 95% confidence? 
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Definitions of new terms and concepts 



Confidence limits: The upper and lower limits of the interval in interval estimation. The interval itself is called the 
confidence interval or confidence range. Confidence limits are so-called because they are determined in ac- 
cordance with a specified or conventional level of confidence or probability that these limits will in fact include 
the population parameter being estimated. Thus, 95% confidence limits are values between which we are 
95% confident that the population parameter being estimated will lie. Confidence limits can often be derived 
from the standard error. 

Interval estimation: Providing an estimate of a population parameter in terms of an interval or range of values 
within which it is likely to lie. 

Level of confidence: Conventionally 95% or 0.95, but may be set higher or lower as desired. 

Point estimation: Providing an estimate of a population parameter in terms of a single value that it is most likely to 
have. A point estimate is usually provided by a sample statistic. By itself, point estimation ignores sampling 
error. 

Population: Any specified group (usually large) of persons, things, or measurement values. 

Population parameter: A descriptive index whose value refers to the population at large, as opposed to a sample 
of the population (for example, a population mean or population proportion). 

Precision of an estimate: The inverse of the standard error of the estimate. The less the sampling error that is likely 
to occur, the greater the precision; that is, the smaller the confidence range, the greater the precision. Hence, 
precision can be specified in terms of the confidence range or the standard error. 

Sample: A subset of a population, whose properties have been, or are to be, generalized to the population. 

Sample statistic: A descriptive index, the value of which is obtained from observations in a sample (for example, a 
sample mean or a sample proportion). 

Sampling: The process of selecting a sample from a population. 

Sampling distribution: The distribution of probabilities with which sampling error of different magnitudes can 
occur purely by chance for a particular sample statistic and sample size. It can be demonstrated experimentally 
by tabulating the values of the same sample statistic obtained from repeated samples of the same size taken 
randomly from the same population. It can also be calculated theoretically (for example, using the binomial or 
the normal sampling distribution). Every sample statistic is a member of a sampling distribution, that is, the 
distribution of values of that statistic that can be expected to occur in different samples of the same size drawn 
randomly from the same universe. 

Sampling error: A difference that occurs purely by chance between the value of a sample statistic and that of the 
corresponding population parameter (for example, the difference between the value of the mean of a random 
sample and that of the universe). Sampling error cannot be avoided or totally eliminated, and must always be 
allowed for when making inferences or drawing conclusions from sample statistics. It can be reduced by 
increasing sample size or using a more appropriate sampling method. 

Sampling fraction: The proportion of sampling units to be selected from a specified sampling frame for inclusion in 
the sample. 
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HANDOUT 8.1 (continued) 



Sampling frame: The set of sampling units from which a sample is to be selected. For example, a list of names, or 
places, or other items to be used as sampling units. 

Sampling unit: The unit of selection in the sampling process. For example, a person, a household or a district. It is 
not necessarily the unit of observation or study. 

Standard error (SE): The standard deviation of a statistic. 

Unit of inquiry: Smallest unit on which data are collected. 

Universe (of a sample): The population of values, of which the values observed in the sample are a random sample, 
and to which the properties of the sample can validly be generalized. The universe of a sample may be an 
abstract or a real population of values, and it may be finite or infinite, depending on the type of sample and the 
nature of the information under study. 

Validity of an estimate: The extent to which an estimate corresponds to the parameter it is estimating. It depends, 
not on the size of the sample, but on the representativeness of the sample. Hence it depends on the type or 
nature of the sample, how it was selected, and on the accuracy of the information from which it was calcu- 
lated and of the calculation itself. 
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HANDOUT 8.2 



Methods of sampling, and their advantages 
and disadvantages 



Sampling 



Advantages Disadvantages 



• Sampling reduces demands on resources such as finance, 
personnel and materials. 

• Results are obtained more quickly. 

• Sampling may lead to better accuracy of collected data; a 
smaller sample allows more effort to be made to reduce 
non-sampling errors and non-response biases. 

• Precise allowance can be made for sampling error (which 
can be found by calculation), although not for non-sampling 
errors. 



® There is always a sampling error. 

• Sampling may create a feeling of 
discrimination within the population. 

® Sampling may be inadvisable where every unit in the 
population is legally required to have a record. 

© For rare events, small samples may not yield sufficient 
cases for study. 



Probability sampling 

• All individuals (elements) in the population have a known chance (probability) of selection. The chance of selection 
need not be the same for each individual or element. 

• The knowledge of the selection probability is in contrast with the situation for non-probability sampling techniques, 
such as quota and chunk sampling. 

• There must be an identified sampling frame, whether of individual elements or clusters of elements, from which the 
sample is to be drawn. 



Simple random sampling 

• Every sample of the same size has the same chance of being selected. 

© Every sampling unit in the sampling frame has the same chance of being selected. 

• Random selection from the sampling frame can be done by balloting, using a table of random numbers, or employ- 
ing a computer. 



Advantages Disadvantages 



• Because every unit in the population has an equal chance of 
being included in the sample, the sample is assured of being 
representative and subject only to sampling error. 

• Estimates are easy to calculate. 



• If the sampling frame is large, this method may be 
impracticable because of the difficulty and expense of 
constructing or updating it in large-scale surveys. 

* Minority subgroups of interest in the population may not 
be present in the sample in sufficient numbers for study. 
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HANDOUT 8.2 (continued) 



Stratified random sampling 

® The population is first divided into groups or strata according to a characteristic of interest (for example, sex, age, 
geographical location). 

® A simple random sample is then selected from each stratum using the same sampling fraction, unless otherwise 
prescribed for special reasons. 



Advantages Disadvantages 



® Every unit in a stratum has the same chance of being 
selected. 

• Using the same sampling fraction for all strata ensures 
proportionate representation in the sample of the 
characteristic being stratified. 

• Adequate representation of minority subgroups of interest 
can be ensured by stratification and by varying the sampling 
fraction between strata as required. 



° The sampling frame of the entire population has to be 
prepared separately for each stratum. 

® Varying the sampling fraction between strata, to ensure 
selection of sufficient numbers in minority subgroups for 
study, affects the proportional representativeness of the 
subgroups in the sample as a whole. 



Systematic sampling 

• Involves the selection of every unit in the population or the sampling frame, where I Ik is the sampling fraction. 

• The first unit to be selected is selected at random from among the first k units. 



Advantages Disadvantages 



• The sample is easy to select. 

° A suitable sampling frame can be identified more easily. 

• The sample is evenly spread over the entire reference 
population. 



® The sample may be biased if a hidden periodicity in the 
population coincides with that of the selection. 

° It is difficult to assess the precision of the estimate from 
one survey. 



Cluster sampling 

• The population is first divided into clusters of homogeneous units, usually based on geographical contiguity. 

• A sample of such clusters is then selected. 

• All the units in the selected clusters are then examined or studied. 



Advantages 



Disadvantages 



• Cuts down on the cost of preparing a sampling frame. 

• Cuts down on the cost of travelling between selected units. 

• Eliminates the problem of "packing" (in health surveys, 
especially those involving case finding and treatment, it is 
not unusual for neighbouring houses not included in the 
sample to transfer their households temporarily to a 
selected house). 



® Sampling error is usually higher than for a simple random 
sample of the same size. 
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HANDOUT 8.2 (continued) 



Multistage sampling 

• Selection is done in stages until the final sampling units (for example, households or persons) are arrived at. 

• In the first stage, a list of large-sized sampling units is prepared. These may be towns, or villages or schools. 

• A sample of these is selected at random, with probability of selection proportional to size. 

• For each of the selected first-stage units, a list of smaller sampling units is prepared. (For example, if the first-stage 
units are towns, then second-stage units may be houses or households.) 

• A sample of these second-stage units is then randomly selected from each of the selected first-stage units. These 
are then studied. 

• The procedure may contain three or more stages. 



Advantage 


* Disadvantage 


• Cuts down the cost of preparing a sampling frame. 


• Sampling error is increased compared with a simple 
random sample of the same size. 
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HANDOUT 8.3 



Examples of sample size determination 



Determination of minimum sample size 

The minimum sample size (n) depends on the: 

• objective; 

• design of the study; 

• plan for statistical analysis; 

• accuracy of the measurements to be made (d)', 

• degree of precision required for generalization; 

• degree of confidence with which to conclude. 

With simple random sampling, for a given magnitude of confidence interval, the precision (z) can be measured by: 
z = rf/SE. 

If we want a 95% confidence interval, z must be 1.96 (see Table B.1, Annex B). Since the SE depends on n, we can 
calculate the value of n required to achieve the chosen level of confidence. 

If s is the sample estimate of the population standard deviation (see Outline 6), then the standard error (SE) of the 
mean, for a sample of size n, is s/Jn. 

For estimating a population mean, with SE = sljn, the minimum required sample size, in general, is: 
n= s 2 /SE 2 =z 2 s 2 M 

For a population of size n, involving a binomial distribution with probability p (see Outline 7), let a individuals be 
observed with the relevant characteristics. Then the standard error of the estimate of p (that is, aln) is J(pqln), where 
<7=1 ~ P- 

Since SE = Jipqln), 

n= pqj SE 2 = z 2 pq/d\ 

These results apply only if sampling is from a very large (theoretically infinite) population, where the ratio of the 
sample size to the population size is very small. 

If sampling is from a finite population of size N, then the minimum sample sizes are: 




for estimating the mean, and: 



n = z 2 pq/(d 2 + z 2 pg/A/V 



for estimating pin the binomial distribution. 
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HANDOUT 8.3 (continued) 



If n 0 is the sample from an infinite population, the finite population sample size is: 
n= n 0 /(l + njN). 

Sampling for a quantitative characteristic 

When sampling for a quantitative characteristic (for example, the mean level of haemoglobin in a population), one 
needs to state: 

— how precisely one wishes to estimate this mean level; that is, the amount of sampling error that can be tolerated 
(d), in either absolute or relative terms; 

— the standard deviation (s) of the distribution of haemoglobin in the population; 

— the chance the experimenter is willing to take to get an unlucky sample giving a sampling error greater than d ; 
a 5% chance of error (that is, a 95% confidence interval) is conventional. 

This means that x ± d are the required 95% confidence limits, so that d = 1.96 SE, where SE = sljn. Hence, 
dl 1.96 = s/Jn, and therefore the required (minimum) sample size for a very large population is given by: 



Example 1 

A health officer wishes to estimate the mean haemoglobin level in a defined community. Preliminary information is 
that this mean is about 1 50 mg/I with a standard deviation of 32 mg/I. If a sampling error of up to 5 mg/I in the 
estimate is to be tolerated, how many subjects should be included in the study? 

Here, s = 32 mg/I, and d = 5 mg/I. 

If the population is assumed to be very large, the required minimum sample size would be: 



Therefore at least 136 people would have to be studied. For a larger community with, for example, N = 3000 people, 
the required sample size would be: 



At least approximately 150 people would have to be studied. 

Sampling for an attribute 

When sampling for an attribute (to estimate the proportion of persons with a certain characteristic in a population) 
one needs to state: 

— a rough approximation to the proportion (p); 



n = 



(].%fs 2 /d 2 . 




Thus, the study needs at least 158 persons. 

If the community to be sampled has 1000 people, the required minimum sample size would be: 
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HANDOUT 8.3 (continued) 



— the sampling error that can be tolerated (d) in either absolute or relative terms; 

— the acceptable chance of an unlucky sample (conventionally 5%). 

The minimum sample required, for a very large population, is then: 

n=(l.96) 2 p(l-p)/tf 2 . 

Example 2 

If p = 0.26, and d = 0.03, then, for a very large population: 

r? = (l.96) 2 X 0.26 X 0.74/(0.03) 2 = 821.2. 

Thus, the study should include at least 822 persons. 

If the sample were from a relatively small population of, for example, 3000 people, the required minimum sample 
could be obtained from the above estimate by adjustment as: 

821.2/(1 + 821.2/3000) = 644.7. 

Thus the study should include at least 645 people. 
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OUTLINE 9 



Tests of statistical significance 



Introduction to the lesson 

Tests of significance are standard statistical procedures for drawing inferences from sample esti- 
mates about unknown population parameters. Sample estimates are never exact, being subject 
to sampling errors. In the design of any medical research, attempts are made to reduce these 
sampling errors. Tests of significance allow us to decide whether the sample estimates, or the 
differences between estimates, are within their normal biological variation, commonly called 
variability due to chance. 

Chance variation can give rise to differences between samples being studied, and so every time 
a difference is observed the question arises as to its statistical significance, that is, whether the 
difference is unlikely to have occurred purely by chance alone. 

Objective of the lesson 

The objective of this lesson is to enable the students to understand the meaning and application 
of tests of significance and their role in statistical inference. Emphasis is placed on their uses and 
interpretation rather than on the theory and methodology of the tests. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain the context and meaning of statistical hypothesis. 

(b) Explain when, and why, a test of significance needs to be carried out. 

(c) Explain the procedures for carrying out tests of significance. 

(d) Differentiate between type 1 and type 2 errors in hypothesis testing. 

(e) Explain the possible outcomes of a test of statistical significance and their respective inter- 
pretations in relation to the context of the test. 

(/) Differentiate between statistical and medical significance. 

(g) Select an appropriate test statistic for the comparison of two means, for independent and 
dependent samples. 

(h) Select an appropriate test statistic for the comparison of two proportions. 

(/) Carry out an appropriate test of statistical significance for the difference between two means, 
for independent and dependent samples. 

(j) Carry out an appropriate test of statistical significance for the difference between two 
proportions. 

Required previous knowledge 

The students should have covered the material in all preceding lessons. It is desirable, at the 
beginning of this lesson, to stress and check that the students have attained the enabling 
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objectives of Outlines 6 and 7 and, in particular, that they have understood the concepts of 
sampling error and sampling distributions (Outline 8). 

Lesson content 

Construct an outline of the lesson with reference to the definitions and explana- 
tions of new terms and concepts in Handout 9.1, with the following content. 

Nature of statistical hypothesis 

• The null and alternative hypothesis 

• Need to test the null hypothesis 

• The meaning of a test of significance 

Situations for tests of significance 

• Comparison of sample estimates against a specified standard 

• Comparison between two sample estimates 

Procedure for testing statistical hypothesis 

• State the null hypothesis 

• State the alternative hypothesis (indicate 1-tail or 2-tail) 

• State the level of significance (explain type 1 and type 2 errors) 

• Choose the test statistic (explain parametric and non-parametric tests) 

• Compute the numerical value of the test statistic from the observed data 

• Compare the calculated value of test statistic with tabulated values in ap- 
propriate standard distribution tables at a specified probability level of 
significance 

• Decide whether or not to reject the null hypothesis according to the p-value 

Interpretation of results of p-values 

• Statistical significance versus medical importance or significance 

• Role of sample size in determining statistical significance 

Comparison of two mean values 

• Independent versus dependent samples (give practical examples) 

• The Mest for each statistical design 

• The standard errors of the difference between two independent sample means 
for data situations when equal or unequal variances are assumed 

Comparison of two sample proportions 

• Use of z for large sample sizes 

• Use of Mest for small sample sizes 
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• Pooled or non-pooled estimate of variance for standard error of a difference 

• The use of the % 2 (chi-squared) test to test for association between two cat- 
egorical variables when data are presented in 2 X 2 contingency tables 

• Yates' correction for continuity 

• Fisher's Exact Probability Test 



ow 



nmwiaaa MamaemB 






Alternative hypothesis; degrees of freedom; hypothesis testing; level of significance; null hypothe- 
sis; 1 -tailed and 2-tailed tests; p-value; probability of a difference occurring purely by chance; 
rejection of a hypothesis; statistical significance; test statistic (z, t x 2 )i tyP e 1 and tyP e 2 errors - 



Structure of the lesson 

During the lesson, liberal use should be made of examples taken from the literature to illustrate 
the role of statistical significance in the interpretation of data and drawn conclusions. 

Worked examples of how a test of significance is actually carried out should be given to the 
students. Examples of the z-test, the % 2 test and the Me st are suggested in Handouts 9.2 and 9.3. 
The lesson content may be presented in the following sequence. 

(a) Outline the context and concept of a statistical test of significance. Make reference to: 

• differences, for example, between the means of certain biochemical, physiological, demo- 
graphic or any health measurements in different samples, or between the proportions 
with certain attributes in different samples, or between the observed and expected number 
of occurrences of certain events; 

• formulation and testing of the null hypothesis; 

• the probability that a difference of a given magnitude or greater magnitude can occur 
purely by chance; illustrate this in relation to the theoretical sampling distribution; 

• the direction of difference and the implications for 1 -tailed or 2-tailed tests; 

• the probability of being wrong in rejecting or not rejecting a hypothesis; type 1 and type 2 
errors. 

( b ) Introduce the concept of level of significance 

• The lowest value p (the probability) must have for an event to be considered "unlikely", 
and hence for the null hypothesis to be rejected and the difference to be described as 
being statistically significant. 

• Describe the conventional levels of significance, i.e. "significant" for p < 0.05; "highly 
significant" for p < 0.01; "not significant" for p > 0.05 or p = 0.05. 

(c) Describe the role of significance testing and the implications of the outcome 

• Discuss the possible causes of the observed difference: 

— chance (the null hypothesis); 

— the factor under study; 

— other "real" factors; 

— "spurious" factors, such as bias and non-comparability. 
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6 The test of significance only takes care of the factor of chance; discuss how the other 
possible causes of observed difference are dealt with. Emphasize the difference between 
statistical and medical significance. For example: 

— a statistically significant difference but of no clinical importance; 

— a non-statistically significant observation but with the results pointing to a possible 
clinical or medical importance. 

• Discuss possible follow-up as a result of a statistical test of significance, for example, re- 
peat of study with an enlarged sample size. 

( d ) Outline the methodology of the various tests of significance 

There are many types of tests of significance, catering to different types of data and differences 
being dealt with. The most commonly encountered are the e-test, the Mest and the % 2 test. 
Mention only the usefulness of the x 2 test and indicate that detailed treatment of this test statistic 
will follow in the next lesson. At least one type of test should be carried out by the students to 
learn the concepts and principles involved; that is, how to: 

• select the appropriate test to be used; 

• differentiate between parametric and non-parametric tests; 

• calculate the test statistic; 

• evaluate its magnitude in relation to its theoretical sampling distribution, in terms of the 
probability that this magnitude could have arisen purely by chance (if the null hypothesis 
were true); 

• decide whether the difference is significant and, if so, at what level of significance. 

Refer to the worked examples given in Handouts 9.2 and 9.3. 



Lesson exercises 

The class exercises should emphasize the proper selection of the test to be used in each specific 
situation and how to interpret the results obtained. The teacher should obtain a data set which 
has both categorical and continuous variables that can be used for the various tests on means in 
dependent and independent situations and in the case of proportions for small and large data 
sets. 

Class exercises are given to provide practice in carrying out tests of significance, and interpreting 
the results in the context of the study objectives. 



■ For each of the following comparisons, name the appropriate test of 

significance: 

• mean weight for preschool boys and girls; 

• mean family size for urban and rural families; 

• serum albumin values for women using an intrauterine contraceptive device 
and for women not using such a device; 

° number of sexual partners of HIV-positive men before and after two years of 
counselling; 

° temperature of children with fever taken before treatment and one hour after 
treatment. 
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■ The average clinic utilization rate for 1 152 infants who reported to Kasangati 
Health Clinic from 1961 and 1979 is provided in Table 9.1. (Kasangati Health 
Clinic is the field station for Makerere Medical School, Institute of Public Health, 
Kampala, Uganda.) The study was reported in the East African medical journal 
(March, 1994). 



Table 9. / Average clinic utilization rate , Kasangati Health 
Clinic, Uganda 



Year 


Number of infants 


Mean utilization rate 


1961 


3 


1.7 


1962 


9 


2.7 


1963 


55 


2.5 


1964 


88 


1.9 


1965 


102 


3.1 


1966 


164 


3.1 


1967 


147 


2.8 


1968 


67 


2.0 


1969 


60 


1.6 






Mean = 2.6 






SD = 1.9 


Year 


Number of infants 


Mean utilization rate 


1970 


16 


1.6 


1971 


90 


3.5 


1972 


80 


4.0 


1973 


71 


3.9 


1974 


65 


3.4 


1975 


43 


3.0 


1976 


42 


3.2 


1977 


33 


2.9 


1978 


11 


2.1 


1979 


6 


1.2 






Mean = 3.4 






SD = 2.3 



Source: Biritwum RB. Record keeping on early childhood diseases in two decades, 
at the health centre level in Uganda. East African medical journal, 1994, 71: 
199-203. Reproduced by permission. 



• Determine whether the average utilization rate per child in the 1960s is sta- 
tistically different from the rate in the 1970s. 

• Comment on the distribution of the data for the test selected. 



■ Table 9.2 gives the summary of the data on immunization of children in Yemen, 
as reported in the Demographic and Maternal and Child Health Survey, 199 1/ 
1992 (source: Demographic and Health Surveys, 1991-92, Macro International 
Inc., Calverton, MD, USA). 
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Table 9.2 Summary of data on immunization of children, Yemen 



Characteristics BCG 



Percentage of children who received 
DPT Polio 

1 2 3+ 1 2 3+ Measles All a None 



Number 

of 

children 



Child's age (months) 



<6 


29.2 


28.9 


18.6 


9.6 


28.9 


18.6 


9.6 


13.5 


7.6 


68.0 


718 


6-11 


47.3 


48.6 


42.8 


30.9 


48.6 


42.8 


30.9 


34.8 


25.1 


49.6 


802 


12-17 


58.8 


60.3 


56.3 


48.7 


60.3 


56.3 


48.7 


51.4 


45.9 


37.3 


627 


18-23 


61.9 


62.4 


54.6 


45.8 


62.4 


54.6 


45.8 


51.6 


44.0 


35.8 


628 


24-59 

Sex of child 


66.6 


65.8 


61.3 


53.3 


65.9 


61.3 


53.3 


58.5 


50.8 


30.4 


3939 


Male 


61.3 


61.1 


55.8 


46.7 


61.1 


55.8 


46.7 


51.8 


43.7 


35.6 


3427 


Female 

Residence 


56.8 


56.7 


50.9 


42.9 


56.8 


51.0 


42.9 


47.2 


40.3 


40.2 


3288 


Urban 


81.3 


81.1 


76.5 


68.1 


81.2 


76.6 


68.1 


70.3 


63.0 


14.9 


1113 


Rural 

Region 


54.7 


54.6 


48.8 


40.2 


54.6 


48.8 


40.2 


45.4 


37.9 


42.4 


5602 


North-west 


55.8 


56.4 


50.6 


42.0 


56.4 


50.6 


42.0 


46.9 


39.3 


41.0 


5793 


South-east 79.6 

Mother's education 


75.4 


71.1 


62.4 


75.4 


71.1 


62.4 


66.5 


59.4 


18.3 


922 


No education 


56.5 


56.6 


51.0 


42.5 


56.6 


51.0 


42.5 


47.1 


39.8 


40.4 


5836 


Primary 


84.2 


81.8 


77.8 


69.2 


81.8 


77.8 


69.2 


73.4 


65.1 


14.6 


383 


More than 
primary 


89.1 


87.3 


83.8 


75.2 


87.3 


83.8 


75.2 


80.5 


72.2 


7.9 


211 


Information not 
collected 


59.3 


57.3 


49.8 


37.1 


57.3 


49.8 


37.1 


47.6 


35.0 


36.6 


202 


Total 


59.1 


59.0 


53.4 


44.8 


59.0 


53.4 


44.8 


49.6 


42.0 


37.9 


6715" 



BCG ( bacille Calmette-Guerin; DPT, diphtheria-pertussis-tetanus. 

Source: Macro International Inc. Demographic and health surveys , 1991-92. Reproduced by permission. 

a Children who are fully vaccinated (that is, those who have received BCG, measles and three doses of DPT and polio 

vaccines). 

Note: The DPT coverage rate for children without a written record is assumed to be the same as that for polio vaccine, 
since mothers were specifically asked whether the child had received polio vaccine. For children whose information was 
based on the mother's report, the proportion of vaccinations given during the first year of life was assumed to be the 
same as for children with a written record of vaccination. 

b Editors' note: Differences between the total numbers of children accounted for under the different "Characteristics" 
headings are not explained in the original source. 



• Which tests should be used to determine whether the proportion of males 
who are fully vaccinated is different from the proportion of females who are 
fully vaccinated? 

• Identify which of the variables show significant differences in the proportions 
of fully vaccinated children. 

• Give three reasons why your conclusions may not be correct either medically 
or statistically. 



■ Explain the different errors that can be made in a statistical test of a 
hypothesis. 



RF£TmPYM/A»8 ARI F* 
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HANDOUT 9.1 



Definitions of new terms and concepts 



1 -tailed and 2-tailed tests: When the difference being tested for significance is not specified in direction (that is, 
takes no account of whether X, < X 2 or X, > X 2 ), then the probabilities in both tails of the sampling distribu- 
tion are used in the test: a 2-tailed test is reguired. When the difference being tested is directionally specified 
beforehand (when X, < X 2 , but not X, > X 2 , is being tested against the null hypothesis X, = X 2 ), then a 1- 
tailed test is appropriate because we are only concerned with the probability P(X, < X 2 ) and not P{X ] > X 2 ). 

Level of significance: The probability of a difference arising purely by chance, below which it is considered suffi- 
ciently "unlikely" for the difference to be considered statistically significant (conventionally 0.05). The prob- 
ability of wrongfully rejecting the null hypothesis. 

Null hypothesis: The hypothesis of "no difference" or, more correctly, the hypothesis that the observed difference is 
entirely due to sampling error, that is, that it occurred purely by chance. In a test of significance, the "null 
hypothesis" is postulated to establish the basis for calculating the probability that the difference occurred 
purely by chance. When the difference is not significant, the null hypothesis is not rejected; when the differ- 
ence is significant, the null hypothesis is rejected in favour of other hypotheses about the causes of the 
difference. Note that the null hypothesis is never proved completely right or wrong, or true or false, but is only 
rejected or not rejected at the probability level of significance concerned, for example, 0.05 or 0.01 . 

p-value: The probability of obtaining the results or more extreme results than those observed in the study under the 
null hypothesis. 

Statistical significance: The concept by which results are judged as due to chance or not. 

Type 1 and 2 errors: Type 1 error is the risk of erroneously rejecting a null hypothesis that is really true. Type 2 error 
is the chance of erroneously failing to reject a null hypothesis that is, in fact, false. 
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HANDOUT 9.2 



Worked example of the z-test for comparing 
two proportions 



Data situation 

A rural health survey investigated 124 households in a village and recorded their sources of water supply. By reviewing 
the village's health centre morbidity records for a period of three months prior to the survey, it was possible to identify 
household members with a history of diarrhoeal episodes. A total of 88 used the river for water supply and 49 of them 
had episodes of diarrhoea, as against 10 from the 36 households using the well. There was no piped water in this 
village. Is there a statistically significant difference in the proportions with episodes of diarrhoea between the house- 
holds using river and well water supplies? 

Solution 

Null hypothesis: there is no difference in the proportion with episodes of diarrhoea between household members 
using river or well water supplies. 

Alternative hypothesis: there is a difference in the proportion of diarrhoea episodes as a result of different sources of 
water supply. (Note that this is a 2-tailed test as no direction is indicated for the difference in episodes of diarrhoea.) 

Level of significance: 0.05. 

Test statistic: The z-test for proportion is chosen as appropriate here: 
z= (pi -p 2 )/SE(p, -p 2 ) 

SE(p, — p 2 ) = standard error of difference in proportion 

SE(p, - p 2 ) = ^{pi(l - Pi)/n, •+ p 2 (l - P 2 )/ n 2 } (if we assume unequal variances) 

or SE(p, - p 2 ) = ^{p(l - p)(l/n, + l/n 2 )} 
where p= (c, + r 2 ) /(n, + n 2 ) 

r, and r 2 are the numbers with attributes (in this case episodes of diarrhoea) in each group; 
n, and n 2 are the sample sizes in each group. 

In our data situation, 
r, = 49 
n, = 88 
r 2 = 10 
n 2 = 36 

p, = 49/88 = 0.5568 
p 2 = 10/36 = 0.2778 
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HANDOUT 9.2 (continued) 



1 - p t = 39/88 = 0.4432 
1 - p 2 = 26/36 = 0.7222 

z = (0.5568 - 0.2778)/J{(0.5568 X 0.4432)/88 4- (0.2778 X 0.7222)/36} 
z= 3.044. 



Conclusion 

Checking with the table of the normal distribution shows that the value of zat the 5% level is 1.96; therefore we 
reject the null hypothesis that the proportion with diarrhoeal episodes is the same in the two groups of households 
using the different sources of water supply. The difference in the proportion with episodes of diarrhoea is unlikely to be 
due to chance, p < 0.05. In fact, it appears that the household members using the river have statistically significantly 
more episodes of diarrhoea than those using the well. However, to establish a causal relationship, further investiga- 
tions would have to be done. 

Note: These data can also be tested by the x 2 test, but the results have to be presented in a 2 x 2 contingency table, 
as shown below. 



Table 9.3 Three-month history of diarrhoeal episodes 



Status 


Number of 

households according 
to water supply 

River Well 


Total 


No diarrhoea 


39 


26 


65 


Diarrhoea 


49 


10 


59 


Total 


88 


36 


124 


Percentage with diarrhoea 


56.7 


27.8 


47.6 



The hypothesis to be tested will now be that of no association between diarrhoeal episodes and source of water 
supply. The data in fact indicate that an association exists between diarrhoeal episodes and source of water in the 
village (with diarrhoeal episodes in 56.7% and 27.8% of households using river and well water, respectively). 



0 
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HANDOUT 9.3 



Worked example of the f-test 



The following data are from a study to compare the mean concentration of lead (in mg/1 00 g) in the blood of a group 
of workers in a battery plant (exposed) with that of a group of workers in a textile factory (not exposed). 



Table 9.4 Mean concentration of lead (in mg/IOOg) in the blood of 
workers in a battery plant and a textile factory 



Battery workers 
(X,) 


Textile factory workers 

w 


0.082 


0.040 


0.080 


0.035 


0.079 


0.036 


0.069 


0.039 


0.085 


0.040 


0.090 


0.046 


0.086 


0.040 



Battery workers 


Textile factory workers 


2X, = 0.571 


xy 2 


= 0.276 


IX] = 0.046847 




= 0.010957 


lx] = 0.0002697143 




= 0.0000757143 


sj = 0.0000449524 


^2 


= 0.0000126190 


s, = 0.0067047 




= 0.0035523 


if, = 0.081 57 


*2 


= 0.03943 


n, =7 


n 2 


= 7 



where *, = /,- 



and x 2 = X 2 - X 2 

We find 

s 2 (pooled) = 0.000028786 

and SE„= s,/(1/n, + 1 ln 2 ) = 0.002868 

where the suffixes 1 and 2 refer to battery workers and textile factory workers, respectively, and SE d is the standard 
error of the difference in mean lead concentrations between the two groups. 

The null hypothesis (H 0 ) is that there is no difference in the mean lead concentration in the blood of the workers of the 
two industries. This implies a 2-tailed test. We have 

d= X, - X 2 = 0.042414. 
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HANDOUT 9.3 (continued) 



The difference is tested against zero, with 

f = d/St d , with (n, + n 2 - 2) degrees of freedom 
= 14.7, with 12 degrees of freedom; p < 0.001 (see Table B.2 of Annex B). 

The null hypothesis is therefore rejected. There is evidence of a significant difference in the mean lead concentration in 
the blood of the workers of the two industries. 



0 
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OUTLINE 10 Association, correlation and regression 



Introduction to the lesson 

The idea of causal relationships lies behind much medical decision-making, in both the preven- 
tive and therapeutic fields. As much of the evidence for relationships in medical science is 
of a statistical nature, students need to understand the statistical basis of such information or 
knowledge about relationships, in order to be able not only to appreciate the limitations of 
conclusions that they read about in the literature, but also to evaluate their own experiences 
more rationally, quantitatively and objectively. 

Objective of the lesson 

The objective of this lesson is to give the students an understanding of the nature of statistical 
evidence for relationships between different characteristics or events in a population, and to 
enable them to use and interpret the statistical methods and indices employed to describe and 
measure such relationships. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Give examples of types of questions concerning health or medicine that are answered by 
analysis of statistical association or correlation. 

(b) Explain the concept of association between two categorical variables. 

(c) Describe a contingency table. 

(d) Carry out the x 2 test when required, with the help of reference material. 

(e) Explain the concept of relationship between two quantitative variables presented in a scatter 
diagram. 

(/) Distinguish between linear and non-linear relationships. 

(g) Interpret the value of a coefficient of correlation. 

(h) Assess the statistical significance of the sample correlation coefficient. 

(/) Explain the concept and application of linear regression. 

(j) Plot a regression line when the equation is given. 

(k) Use linear regression for interpolation and prediction. 

(/) Differentiate between statistical and causal relationships. 

Required previous knowledge 

The contents of all previous lessons. 
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Lesson content 

Situations in which analysis of statistical association or correlation can provide 
answers 

• Studies of two or more variables measured on the same subject (or unit of 
inquiry) where the interest is in their relationship 

Association between two categorical variables 

• Cross-tabulation and contingency tables 

• Distinction between contingency tables and other statistical tables 

• Calculation of expected cell frequencies under the null hypothesis 

• Importance of testing statistical significance of the association 



The% 2 test 
Procedure 

• The null hypothesis 

• Calculation of expected frequencies ( E ) for each cell under the null 
hypothesis 

• The concept of degrees of freedom 

• Calculation of the x 2 statistic 

• Correction for continuity, for 2X2 tables 

Limitations 

• Effect of small expected frequencies 

• Applicability only to categorical data 

Interpretation 

• Use of the table of the theoretical distribution of x 2 to determine significance 



Relationship between two quantitative variables 

(refer to Figures 1 0.1 -1 0.7 in Handout 1 0.2) 

Pearson coefficient of correlation (r) 

• Scatter diagram and its use in indicating the nature and strength of the 
relationship 

• Linear (Figures 10.1, 10.3, 10.4 and 10.7) and non-linear (Figures 10.2 and 
10.6) relationships 

• Difference in scatter in the case of strong (Figures 10.1 and 10.3) and weak 
(Figures 10.4 and 10.7) relationships 

• Positive (Figures 10.3 and 10.4) and negative (Figures 10.1 and 10.7) 

• Pearson correlation coefficient as a measure of strength and direction of 
linear relationship 
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Properties 

• Unit free (the coefficient r is an absolute number) 

• Independent of change of origin and scale 

• Lies between - 1 and + I 

Magnitude, sign and interpretation 

• Meaning of a particular value of r 

• Interpretation of the magnitude and sign of r for a linear relationship 



• Measure of the strength of association between two quantitative variables 
Misuses 

• Concluding no relationship from zero correlation, while in fact a strong non- 
linear relationship may exist 

• Unwarranted conclusion from spurious correlation 

• Concluding a cause-effect relationship from a correlation, while it might just 
be an indirect relationship 

• Concluding an agreement between pairs of measurements, while they may 
not have the same values at all points 

Computation of the correlation coefficient 

Using the formula available in text books or using programmable calculators or 
computers. Different books may give different versions of the formula. The teacher 
should decide which version would be the easiest for the students to use. 

Assessing the statistical significance of a coefficient of correlation 

• Procedure 

• Limitations 

• Degrees of freedom 

• Interpretation 

Linear regression 

A regression estimates the nature of the relationship. The concept and applica- 
tions of linear regression should be covered, with explanation of the terms de- 
pendent and independent variables. A description of the regression line should 
be given. 

Definition and characteristics of the linear regression 

• The equation Y = bX + c 

• The regression coefficient or slope b, given by S(A r - X)(Y - Y)/2.(X - X) 2 

• The intercept c, given by Y - bX 



Uses 
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Computation of the linear regression 

• The computation of b and c 

• Plotting the line on the scatter diagram 
Uses 

• Measure of linear association 

• Interpolation 

• Prediction 

Misuses 

• Extrapolation without assurance that the trend remains the same 

• Using a regression relationship whose slope has been shown to be not signifi- 
cantly different from zero 

• Forgetting that the predicted values are subject to sampling error 

• Concluding that a cause-effect relationship exists, whereas the relationship 
may just be statistical 

• Applying a relationship established in one group of subjects to another group, 
without the assurance that it is applicable to all groups 



MgMTfTlMlg 

Association; bivariate relationship; multi-factorial relationship; net effect; cross-tabulation; x 2 ; con- 
tingency table; cell frequency; row total; column total; grand total; expected frequency; independ- 
ent and dependent variables; linear and non-linear relationships; spurious correlation; agreement; 
slope and intercept; regression coefficient; extrapolation and interpolation. 



Structure of the lesson 

(a) Introduce the topic by describing the general purpose of correlation and regression analyses. 
Give examples from the current literature on topics of local interest. Illustrate the statistical 
nature of the relationships and the importance of studying those relationships. 

For example: 

• smoking and lung cancer; 

• intake of iron and folic acid in diet or as supplement and haemoglobin level; 

• mother's education and size of family; 

• quality of drinking-water and diarrhoea; 

• height of a person and height of his or her father; 

• energy intake by a woman during pregnancy and birth weight of her child; 

• severity of disease and cure rate. 

(b) Differentiate between the nature of relationships among categorical and quantitative 
variables. Explain the need to have different procedures for the two types of variables. 
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(c) Recapitulate the basic principles of hypothesis testing. For the x 2 statistic as a means for 
testing the statistical significance of the association between two categorical variables, give a 
heuristic explanation of the formula of x 2 so that the students realize that (0 — E) 2 /E is a 
measure of deviation from independence. Emphasize that E is the cell frequency expected 
under the null hypothesis of independence (that is, of no association). 

(d) Explain the concept of degrees of freedom by giving actual examples of, say, 2X2 and 
2X3 tables illustrating the "freedom" to choose frequencies in one and two cells, respec- 
tively, under the constraint of fixed marginal totals. Stress the interpretation of a significant 
X 2 as mere presence of association, with no implication for the strength of the association. 
Point out that the magnitude of x 2 is severely affected by n . Indicate what further calcula- 
tions are required to measure the strength of the association. 

(e) Distinguish between linear and non-linear relationships between two quantitative variables 
by giving examples, as shown in Handout 10.2. Emphasize that the coefficient of correlation 
measures only the linear component of the relationship, which may not exist in some cases, 
despite the presence of a strong non-linear relationship. Make liberal use of diagrams to 
illustrate the magnitude and direction of correlation coefficients. 

(/) Use scatter plots to explain the fluctuations around a line, even in the case of a linear rela- 
tionship. In the case of high fluctuations, the predictive value of the relationship can be 
reduced substantially. 

(g) Briefly explain the statistic [r J(n - 2)]/[ x /(l - r 2 )] following Student's t distribution with 
n — 2 degrees of freedom, subject to the normality of either X or Y. This is just to test the 
hypothesis that the correlation is zero in the population. For testing other values of correla- 
tion, tests based on Fisher's z transformation are required. Explain the role of the sample size 
in placing confidence on the value of a coefficient of correlation. 

(h) Briefly explain the use of the Mest for the hypothesis of no correlation. This test does not 
provide any clue to the magnitude of the correlation. A better indication of the magnitude of 
correlation can be obtained by computing 100 X (1 - r 2 ), as the percentage of the variation 
in the dependent variable is explained by its association with the independent variable. Also 
mention that the Mest requires normality of Y, particularly for small samples, and that this 
test should not be used indiscriminately. 

(/) Discuss the need to obtain the nature of the relationship in the form of an equation. Restrict 
the lesson to linear relationships only. 

(j) Come back to the scatter plots used earlier in the context of correlation and illustrate various 
types of regression lines. Use the illustrations in Handout 10.2 to explain the meaning of the 
slope measured by the regression coefficient and of the intercept. Show the equivalence of 
the use of r and b to indicate association. Give examples of situations for preferring one over 
the other. 

(k) Superimpose scatters of different variability and explain how variability affects the reliability 
of predictions — whether extrapolated or interpolated. 

(/) Discuss the uses and misuses of regression lines on the basis of the examples chosen from 
published literature. 



Lesson exercises 

The teacher should give two different kinds of health-related data to the students: one set for 
2X2 cross-tabulated data of two discrete variables, and the other for two quantitative variables 
measured on the same individuals. The exercise should focus on enabling the students to pro- 
duce a scatter diagram and to carry out appropriate procedures to test association between the 
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variables in each of the two situations. The students should also be tested on their ability to 
interpret the results of the tests. 



■ Give students a 2 X 2 table with relatively small frequencies, so that the x 2 is 
not significant. Multiply each frequency by 10 so that the proportion remains 
the same. Compute % 2 again and see how dramatically the value and, conse- 
quently, the significance change. An example could be as follows: 







Favouring sex education 






Yes 


No 


Total 


Male 


8 


5 


13 


Female 


5 


7 


12 


Total 


13 


12 


25 



X 2 — 0.987, df — 1, p > 0.25. When each frequency is multiplied by 10, then 
X 2 = 9.87, df = 1 but now p < 0.001. 



■ Give some scatter plots of known data and ask the students to make an edu- 
cated guess of the magnitude and direction of the coefficient of correlation. In- 
clude among the scatter plots at least one random plot (no correlation) as well as 
at least one with a non-linear relationship. Let the students calculate r to check 
how good their guesses were. 



■ Give some regression lines with different slopes and ask the students to inter- 
pret each of them. Superimpose scatter plots on them with different variability 
and let the students describe the impact of variability on the reliability of the 
conclusions based on the regression equation. 



■ Use the data on age, height and weight of a male preschool child, followed up 
from the age of six months, to draw a scatter diagram and find the best regres- 
sion line for age and weight. 



Age (months) 


Height (cm) 


Weight (kg) 


6 


66.9 


7.1 


7 


68.5 


7.2 


12 


72.0 


7.8 


16 


77.0 


8.3 


18 


79.0 


8.9 


22 


82.1 


9.2 


24 


82.7 


9.5 


26 


84.2 


10.4 


30 


86.0 


11.0 


32 


86.5 


10.8 


34 


89.5 


11.4 


35 


89.7 


11.8 


43 


95.0 


13.0 
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Definitions of new terms and concepts 



Association: 1 The degree of statistical dependence between two or more events or variables. 

Bivariate relationship: Association (or relationship) between two variables. 

Cell frequency: The number of observations in a cell of a contingency table. 

Column total: The total number of observations in a column of a contingency table. 

Contingency table: 1 A tabular cross-classification of data such that subcategories of one characteristic are indicated 
horizontally (in rows) and subcategories of another characteristic are indicated vertically (in columns). 

Dependent variable: In a regression analysis, this is the variable of which the value is thought to be predictable 
from another variable. 

Expected frequency: The'number of observations to be expected in a class or cell if the null hypothesis is true. 

Extrapolation: The use of the regression line to predict a value of the dependent variable from that of the indepen- 
dent variable outside the range of values actually observed. 

Grand total: The total number of observations cross-classified in a contingency table. 

Independent variable: The variable, in a regression analysis, of which the value is thought to be predictive of 
another variable. 

Interpolation: The use of a regression line to estimate a value of the dependent variable from that of the independ- 
ent variable within the range of values actually observed. 

Linear relationship: In a regression analysis, when the mathematical model describing the dependent variable in 
terms of the independent variable is in the form of a straight line. 

Multi-factorial relationship: Association (or relationship) between several factors or variables. 

Non-linear relationship: When the form of the model describing yin terms of xis not a straight line. 

Regression analysis: 1 Given data on a dependent variable y and an independent variable x, regression analysis 
involves finding the "best" mathematical model (within some restricted form) to describe y as a function of x 
or to predict y from x. 

Regression coefficient(s): For a linear regression, these are the estimated slope and the intercept of the straight 
line describing the dependent variable as a function of the independent variable. 

Row total: The total number of observations in a row of a contingency table. 

Spurious correlation: 1 An association between two variables that may be artefactual, fortuitous, false or due to all 
kinds of non-causal associations resulting from chance or bias. 



' From Last JM (ed.) A dictionary of epidemiology, (3rd ed.) New York, Oxford University Press, 1995. 
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Diagrammatic presentation of different 
types of correlations 



Figure 10. 1 Strong negative correlation (r = -1) 
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Figure 10.2 Non-linear 




Figure 10.3 Strong positive correlation (r = 1) 
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HANDOUT 10.2 (continued) 



Figure 10.4 Positive correlation (0 < r < 1) 
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Figure 10.5 No correlation (r = 0) 
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Figure 10.6 Non-linear 
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HANDOUT 10.2 (continued) 



Figure 10.7 Negative correlation (— 1 < r < 0) 
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PART II 



Health statistics, including 
demography and vital statistics 
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OUTLINE 11 Censuses and vital registration 



Introduction to the lesson 

Before we can assess the magnitude of the public health problem posed by a specific disease, or 
the impact of an intervention programme, we must have an idea of the size of the community 
we are dealing with, its composition with respect to various demographic characteristics, and 
the magnitude of changes in relation to vital events (births and deaths). 

Objective of the lesson 

The objective of this lesson is to familiarize the student with the sources of data on the size of the 
population and its composition. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain the concept of population size, and how it is possible to define the population of a 
specified geographical area at a specified time in different ways. 

(b) Give a brief history of censuses in the country. 

(c) Summarize the steps in the organization of local population censuses (with reference to a 
field research project) necessary to achieve complete coverage without overlapping 
information. 

(d) List the items of information obtainable from the latest census that are relevant to the medi- 
cal field. 

(e) State and elaborate on the uses of these items with examples from the latest national popu- 
lation census. 

(f) Discuss the reliability and the limitations of census data. 

(g) Define vital events, vital statistics. 

( h ) Describe the vital registration system. 

(?) State the reasons for recording, reporting and registering of births and deaths. 

(j) Discuss the role doctors and other health workers should play in the system of recording, 
reporting and registering births, deaths and fetal deaths, as it operates in the country. 

(k) Discuss the reliability of, and reasons for shortcomings in, birth and mortality data. 

(/) Suggest improvements in the vital registration system. 

Required previous knowledge 

Lesson content of Outlines 2 and 3. 

105 
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Lesson content 

Census of population 

Reasons for census taking 

A census provides an indication of population size, that is, the total number of 
persons in a given area. It can be used to demonstrate variation in population 
sizes between countries, and between geographical and administrative subdivi- 
sions of a country. Reasons for census taking are: 

• to provide data for use in planning various services, including health 
services; 

• to determine denominators of indices of health; 

• for administrative and political purposes. 

Types of population 

De facto population: the actual population present in the census area on census 
day; also known as the "present-in-area" or "enumerated" population. Features 
of this measure of population are: 

• it avoids a distinction between temporary and permanent residence; 

• people in transit present problems for enumeration and may be missed; 

• it is not easily open to conscious manipulation; 

• it may give a false impression of size for areas with high migration or high 
seasonal mobility; hence, the choice of date of the census is critical. 

De jure population: all permanent residents who habitually live in the 
census area; also known as the "resident" population. Features of this measure 
are: 

• it requires a definition of "permanent residence", which may be difficult in a 
population of high mobility, to differentiate it from "temporary" residence; 
there may be confusion with "usual" or "legal" residence, or place of 
domicile; 

• residents temporarily away present problems for enumeration and can easily 
be missed; 

• it is subject to possible biases in implementation of residential criteria; 

• it is technically free from the influence of short-term or seasonal mobility or 
migration, but hence may not reflect the size of the population actually present 
in the area at a given time. 

History of population censuses in the country 

Organizational steps for national and local population censuses 
(including the actual enumeration) 

The following steps should be carried out: 

• statement of the reasons for the census; 

• choice of questions to be asked; 
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• design of the census form; 

• recruitment of census staff; 

• training of recruited staff; 

• testing of the census form; 

• dealing with financial and legal aspects; 

• enumeration area demarcation, including numbering of houses; 

• population enumeration (preliminary and final). 

Data organization leading to publication of results 
The following steps should be taken: 

• coding and checking; 

• producing computer data files; 

• preliminary tabulations and interim publications; 

• consolidation of results and detailed publications. 

Characteristics of the population 

In a population census, information may be collected about the following char- 
acteristics: age, sex, marital status, area (place) of residence (address), literacy, 
occupation, economic activity, relationship within a household, etc. 

Problems in definition of characteristics 

Problems may arise in defining the following demographic characteristics: 

• age: concept of "completed" years of age, whether stated without proof, stated 
with proof (for example, birth certificate), estimated on the basis of growth 
milestones, estimated on the basis of a calendar of events, guessed; 

• place of residence: permanent residence, temporary residence; 

• occupation: multiplicity of possible occupations, multiple occupations; 

• relationships and marital status: may raise difficulties in some cultures. 

Census reports 

Consider the information available from a census, and the limitations of cen- 
suses as sources of health statistics. 

Uses of census data in the health field 
Census data may be used: 

• for planning health services; 

• as a source of denominators for health indicators. 

Reliability and limitations of census data 

It should be remembered that results are estimates of the actual situation (even 
for the point in time of enumeration). 
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Registration of births and deaths 

Vital statistics: data on various vital events of human life , such as births , stillbirths , deaths , 
marriages , divorces 

Definitions of technical terms 

Reasons for recording, reporting and registering births and deaths 

These reasons are: 

• for individual (personal) documentation; 

• for legal and civic purposes (for example, establishing citizenship, evidence 
of which may be needed for social and welfare services); 

• to maintain a "balance sheet" of the population. 

The role of the health worker in the registration system 
The health worker may take the role of: 

• an attendant at births and deaths; 

• a certifying official for death; 

• a user of the information on births and deaths; 

• a citizen. 

Reliability and shortcomings 

There may be problems with registration of births and deaths: 

• in applying the definition of a live birth (especially in connection with severe 
congenital malformations); 

• in applying the definition of a late fetal death (especially in connection with 
determining correctly the period of gestation); 

• births followed by early neonatal deaths may not be reported to be registered 
as births (although they may be recorded as births and deaths if they occur in 
a health facility); 

• there may be a lack of motivation among the general public to register an 
event; 

• registration system may not exist or it may be incomplete or unreliable. 
Systems 

Systems existing in the country for recording, reporting and registering deaths 
and fetal deaths should be discussed from several points of view. For example, 
consideration may be given to: 

• the system for registration of deaths and stillbirths in the country; 

• possible differences in methods used for recording, reporting and registering 
deaths, from country to country or region to region; 

• the agencies responsible for registering death, according to the legal provisions 
of the country (in Turkey, for example, reporting of deaths to the office of 
population registration is the responsibility of the head of the district). 
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It should be emphasized that a uniform system of registering deaths is necessary 
for national and international comparisons. 

Shortcomings in the registration system 

Shortcomings in the recording, reporting and registration of deaths and fetal 
deaths may occur through: 

• incomplete data; 

• lack of uniformity in collection or reporting of such data; 

• lack of uniformity in definitions of events and criteria for reporting; 

• lack of accuracy, particularly with respect to age at, and cause of, death. 

Improvement of the registration system 

The following are a number of suggestions for possible improvement of the sys- 
tem (the discussions should be focused on ways of improving the existing sys- 
tem of the country): 

• compulsory registration of deaths; 

• establishment of a uniform system throughout the country; 

• health education by the primary reporting agency to promote recognition of 
the significance and usefulness of such data; 

• use of special sample surveys to check the extent and accuracy of registration 
of deaths by regular agencies; 

• training of persons certifying deaths. 



mss 



mem 



Census; census area; census unit; enumerator; intercensal period; population census; de facto 
census; de jure census; population size; household; characteristics of population; reticulation; 
fetal death (early, intermediate, late); live birth; stillbirth; vital events. 



Structure of the lesson 

Explain the importance of population data for planning and organizing health services, and how 

the population structure affects the need for, and utilization of, the various medical and health 

services. 

(a) Give a brief history of census taking in the country, mentioning when the first census took 
place, the population groups covered, what the census results were used for and the subse- 
quent developments. 

(b) Discuss the methods of obtaining reliable census and registration data, and the associated 
problems. Also discuss the use of population data. 

(c) Discuss the types of sources of population data, their advantages and disadvantages. 

(d) Arrange for census publications from the Census Office to be made available to students as a 
source of further information on national censuses. 
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(e) Discuss why standardization of definitions of census characteristics and vital events is impor- 
tant for studies of patterns, trends and differentials. 

(f) Discuss international recommendations on the definitions of census characteristics and vital 
events, pointing out whether they differ from those used in the country. 



Lesson exercises 

Class exercises should test the students' grasp of the various sources of population data, their 
limitations and associated problems. Questions should also examine their knowledge and 
understanding of the various terms used in connection with population censuses and vital reg- 
istration systems. Exercises should furthermore test their ability to identify appropriate sources 
of data needed to solve particular health problems. 

Population census 

■ What is the concept of the population of a country, in terms of population 
size and types? 



■ What is: 

• a census? 

• a census area? 

■ What are the relative advantages and disadvantages of taking a census of de 
facto and de jure populations? 



■ What are the necessary steps in the organization of a local population 
census, with special regard to completeness of coverage and non-duplication of 
information? 



■ Describe the system of birth and death registration in your country, pointing 
out any strong and weak areas of the system. 



Registration of births and deaths 

■ What is meant by registration of births and deaths, and what do you under- 
stand by the terms "live birth", "stillbirth", "fetal death" and "death"? 



■ Why should births and deaths be reported, recorded and registered? 



■ What is the system for reporting, recording and registering births and deaths 
in the country? 



■ What role should health workers play in the system? 



■ What are the limitations of the data on births and deaths? 



■ Describe how information on population and the registration of vital events 
is used. 
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Definitions of new terms and concepts: 
population censuses 



Census: Count (enumeration) of items. 

Census area: The defined area in which the census is to be done. 

Census unit: The smallest area into which the census area is divided for administrative and data collection purposes. 
De facto census: The method of counting the actual population present in the census area on census day. 

De jure census: The method of counting all permanent residents who habitually live in the census area. 
Enumerator: The person who carries out the enumeration. 

Household: A group of persons living together in a house and sharing common food arrangements. 

Intercensal period: Time between two censuses; usually 10 years for national population censuses. 

Local population census: A population census covering small defined areas. 

National population census: A population census covering the whole country. 

Population census: The total process of collecting, compiling and publishing demographic, economic and social 
data pertaining, at a specified time or times, to all persons in a country or delimited territory. 1 

Population characteristics: The socio-economic-cultural structure of the population in an area. 

Population size: The total number of persons who make up the defined population in a specified area at a specified 
time. 

Reticulation: The determination of boundaries of census areas, units, and other subdivisions in the country. 



United Nations Statistical Office. Handbook of population census methods , Vols /-///. New York, 1958-59 (Studies in Methods, Series F, No. 7). 
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HANDOUT 11.2 



Definitions of new terms and concepts: 
registration of births and deaths 



The definitions given here are abstracted from the WHO Technical Report Series, No. 25, 1 950, and the tenth revision 
of the ICD. 1 (The definitions used in the particular country concerned, if different from these, should also be included 
here.) 

Fetal death: Death prior to the complete expulsion or extraction from its mother of a product of conception, irrespec- 
tive of the duration of pregnancy. The death is indicated by the fact that after such separation the fetus does 
not breathe or show any other evidence of life, such as beating of the heart, pulsation of the umbilical cord, or 
definite movement of voluntary muscles. 

Fetal deaths are subdivided by period of gestation (measured from the beginning of the last menstruation) as 
follows: early fetal death, intermediate fetal death, late fetal death. 

Early fetal death: A fetal death that occurs before 20 completed weeks of gestation. 

Intermediate fetal death: A fetal death that occurs at or after 20 completed weeks of gestation but before 
28 weeks. 

Late fetal death: A fetal death that occurs at 28 completed weeks of gestation and over. 

Live birth: The complete expulsion or extraction from its mother of a product of conception, irrespective of the 
duration of pregnancy, which, after such separation, breathes or shows any other evidence of life, such as 
beating of the heart, pulsation of the umbilical cord, or definite movement of voluntary muscles, whether or 
not the umbilical cord has been cut or the placenta is attached. Each product of such a birth is considered live 
born. 

Stillbirth: A late fetal death, that is, a fetal death that occurs at or after 28 completed weeks of gestation. 



1 International Statistical Classification of Diseases and Related Health Problems, Tenth revision (ICD- 10). \ /of 2. Geneva, World Health 
Organization, 1993. 
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OUTLINE 12 Measurement of morbidity 



Introduction to the lesson 

The role of medicine is to improve the health status of individuals, through either clinical 
medicine or public health. Planning for health services should be based on the health needs of 
defined populations, which can be determined in part by morbidity and mortality levels. Monitor- 
ing of changes in the health patterns of populations, and evaluation of health services and 
programmes, are based largely on morbidity data. 

Objective of the lesson 

This lesson aims at enabling the students to identify the sources of information on morbidity and 
disability in the country, and to summarize these data using appropriate indices. 

Enabling objectives 

At the end of the lesson the students should be able to: 

{a) Explain the difficulties inherent in attempting to define and measure health, morbidity and 
disability. 

( b ) List the sources of local and national morbidity and disability data. 

(c) Indicate the types of data obtainable from each listed source. 

{d) Indicate the completeness and accuracy of the data from each listed source. 

{e) Suggest briefly how the stated completeness and accuracy could be improved. 

if) Define disease prevalence and incidence, differentiating between point and period 
prevalence, and between persons and episodes. 

(g) Differentiate between and use the indices: proportion, ratio and rate. 

(h) Identify the correct numerators and denominators (population at risk) for incidence and 
prevalence, for a given set of morbidity data. 

(/) Compute and interpret crude and specific morbidity proportions, ratios and rates, for a given 
set of appropriate data. 

Required previous knowledge 

Lesson content of Outlines 2, 3 and 11, and some knowledge of the organization of the country's 
health service. 



Lesson content 

Definition and measurement of health, morbidity and disability 

* Definition of health, morbidity and disability (see Handout 12.1) 
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• Measurement of health, morbidity and disability through personal statements, 
observation of performance and activity, physical examination, and labora- 
tory examination 

Difficulties of defining and measuring health, morbidity and disability 

• Problems of case definition 

• Variation in perception of sickness among individuals, cultures and over age 

• Inconsistencies in diagnostic procedures 

• Reporting inadequacies and constraints 

• Inadequacies of health care workers 

Sources of local and national morbidity and disability data 

• Routine health service records (including periodic school health examina- 
tions) provide, for example, general morbidity and disability data, by 
diagnosis or symptomatology, in accordance with the sophistication of the 
institution 

• Routine data collection and notification systems of government and private 
institutions provide data on new cases of communicable diseases, etc. 

• Occupational health institutions provide occupation-related morbidity and 
disability data 

• Patients' groups (for example, people with diabetes) provide detailed 
disease-specific or condition-specific data on individual patients 

• Disease registries (for example, for cancer or mental disorders) provide 
detailed information on the group of disease or conditions covered by 
the registry 

• Surveillance records of selected diseases (primarily for detection of outbreaks) 
provide, for example, information on the time course of diseases under 
surveillance 

• Reports from volunteer workers contain secondary and generally crude data 
on morbidity and disability 

• Reports to ministries of health and international organizations contain sum- 
mary data on morbidity and disability, usually also with demographic data on 
the reference populations 

Completeness and accuracy of data 

Data may be incomplete or inaccurate for the following reasons. 

• Routine health service records: 

— data are only available on those seeking the services of the health 
institutions; 

— the reference population is undefined. 



TEACHING HEALTH STATISTICS 



116 



• Health institutions: 

— the health institutions are not primarily data collection centres, but are 
service units; hence there are likely to be undetected imperfections in the 
collected data. 

• Occupational health institutions: 

— there may be differential coverage of the population served by the 
occupational health institutions (for example, data at entry into service 
will most likely be available on employees only, and not on their 
dependents); 

— some industries may have an unstable (mobile) population (for example, 
agricultural industries). 

• Patients' groups and registries: 

— data are only available if these patients use any health services; coverage 
of the registries may be unsatisfactory. 

• Disease registries: 

— most registries have data only from areas or health centres with special 
facilities relevant to the diagnosis of the diseases covered by the registry. 

• Surveillance systems: 

— data from notification reports usually do not refer to a specific reference 
population, and their accuracy depends on the conditions prevailing at 
the time of notification; 

— data from special surveys are usually available for a single point in time; 
hence they may not be useful for study of morbidity and disability 
patterns over time; 

— these systems tend to pick up chronic diseases rather than acute 
conditions. 

• Reports: 

— data from reports do not provide information on individuals; 

— being summary data, reports are generalizations of what was observed. 

The usefulness of the data obtained is likely to decrease as one moves up the 

hierarchy of health services and reporting agencies. 

Suggestions to improve data 

• Strengthen hospital medical records departments through training 
programmes 

• Create awareness among primary health data generators of their important 
role as primary data contributors 

• Encourage all health workers to use health-related information in support of 
their activities 

• Feed information back to health institutions, so that those involved in 
generating the data can see the results of their labours 
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• Make reports more comprehensive, so that the summary information is 
complete, meaningful and useful 

• Data elements should produce a primary benefit at the level of collection 

Common disease indices 

Define and differentiate between the following disease indices, explaining their 

particular uses: proportion, ratio and rate. 

Examples: 

• If 105 male and 115 female babies are born alive with congenital malfor- 
mations in one year then the proportion of malformed male babies among 
children with malformation is: 



• The ratio of male to female congenitally malformed children is: 

105:115 or 1:1.1. 

• The average number of children born with congenital malformations per year, 
for every 1000 live births, is a rate of congenital malformations per year. 

Numerators and denominators of morbidity indices 

The following table gives numerators and denominators of morbidity indices. 



Index 


Numerator 


Denominator 


Proportion 


People with the disease 


All people (with and without the disease) 


Ratio 


People with the disease 


People without the disease 


Rate 


People with the disease in a given period 


All people (with and without the disease) 



Prevalence and incidence 

Define prevalence (point and period) and incidence, describing their applica- 
tions. Mention should also be made of their interrelationship, and their depend- 
ence on the duration of the disease. 

Examples: 

• Disease period prevalence : 

— the number of children diagnosed as malnourished in a health centre catch- 
ment area during the year 1994 (old cases starting before the study plus 
new cases starting during the study); 

— the number of accident cases in the surgical ward of a hospital during the 
month of January 1995 (one month period: old cases that were present at 
1 January 1995 plus new cases admitted during January 1995). 

• Disease point prevalence : 

— the number of accident cases in the surgical ward of a hospital on a certain 
day, for example, on 15 January 1995. 
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• Disease incidence: 

— the number of children diagnosed as having malnutrition during the year 
1995 in the catchment area of the health centre; 

— the number of new accident cases admitted to the surgical ward of the 
hospital during the month of January 1995. 
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Health; morbidity; disability; prevalence (point, period, person, episode); incidence (person, 
episode); population at risk; proportion; ratio; rate. 



Structure of the lesson 

(a) When presenting the lesson the teacher should use medically (health) oriented examples to 
illustrate the various points. The selected examples should be relevant and meaningful to 
the students. 

(b) The teacher should explain the statistical implications of the definitions of health, morbidity 
and disability. The presentation should cover problems of incompleteness and inaccuracy of 
data, with indications as to how these problems could be solved. 

(c) Discussion of the sources of local and national morbidity and disability data is best done by 
considering each level of the health system in turn, from the peripheral to the national 
(central) level. The discussion should include examples, not only of the different data ele- 
ments collected at each level, but also of the different depths at which data on a characteris- 
tic may be collected at the different levels. (Reference may be made to Outline 2.) The teacher 
should stress that the primary need for morbidity and disability data is for health manage- 
ment at the level at which they are collected. 

(d) The discussion of indices of health, morbidity and disability should emphasize their uses 
more than their computation. The distinction between prevalent and incident cases may 
best be demonstrated graphically (an example is shown in Handout 12.2). 



Lesson exercises 

The exercises on measurement of morbidity should be as relevant to the students' environment 
as possible, as they are among the best vehicles for demonstrating the use of statistical principles 
in future work. A local data source should be used for the exercises. The exercises should focus 
on the sources of morbidity data, the problems involved in routine data collection and short- 
comings of data sources. The exercises should also aim to help the students practise the compu- 
tation and interpretation of simple indices of morbidity (proportions, prevalence, incidence, 
ratios and rates). 



■ Describe the disease pattern using data from your country. 



■ Define the prevalence rate of a disease and describe steps to obtain the data 
for the calculation of the rate. 



■ Calculate the age-specific prevalence rate from the data on disease surveil- 
lance in your country. 
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■ The data obtained on absenteeism due to acute upper respiratory system 
diseases among the employees in a big carpentry factory during the month of 
January 1995 are given in Handout 12.2. The estimated total number of em- 
ployees working in this factory is 250. 

— Identify which cases should be used to calculate prevalence and incidence 
rates, and the number of sickness episodes and sick employees. 

— Calculate the point prevalence for 10 January 1995. 

— Calculate the period prevalence and incidence rates. 
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Definitions of new terms and concepts 



Disability: Restriction or lack (resulting from an impairment) of ability to perform an activity in the manner or within 
the range considered normal for a human being. 

Handicap: A disadvantage for a given individual, resulting from an impairment or a disability, that limits or prevents 
the fulfilment of a role that is normal for that individual. 

Health: Defined in the Constitution of the World Health Organization as "a state of complete physical, mental and 
social well-being and not merely the absence of disease or infirmity". 

Impairment: Any loss or abnormality of psychological, physiological or anatomical structure or function. 

Incidence: Occurrence of new cases of a specified disease in a specified community during a specified period of time. 

Morbidity: Any departure, subjective or objective, from a state of physiological or mental well-being, whether due to 
disease, injury or impairment. 1 

Population at risk: People who stand a chance of contracting a specified disease (for example, during an epidemic 
outbreak). 

Prevalence: A measure of the total number of existing cases (episodes or events) of a disease or condition 
at a specified point in time. (If a period of time is specified, then the resulting disease measure is period 
prevalence.) 

Proportion: Defined as the fraction alia + b ) for mutually exclusive groups with elements a and b. (The b elements 
may belong to more than one group, each mutually exclusive of the group with the a elements.) 

Rate: A measure of the "speed" at which events are occurring (for example, rate of incidence of a specified disease 
is a measure of the "speed" with which new cases occur in the community). 

Ratio: Defined as the fraction alb for two mutually exclusive groups with elements a and b (conventionally expressed 
as 1 : b/a). 



’ WHO Expert Committee on Health Statistics. Sixth Report. Geneva, World Health Organization, 1959 (WHO Technical Report Series, No. 1 64). 



TEACHING HEALTH STATISTICS © WORLD HEALTH ORGANIZATION 1999 



120 





HANDOUT 12.2 



Example of data on absenteeism due to acute 
upper respiratory system diseases, Turkey 



The data obtained on absenteeism due to acute upper respiratory system diseases among the employees in a big 
carpentry factory in Ankara, Turkey, during the month of January 1995 are given below. The estimated total number of 
employees working in this factory is 250. 



Name of 
patient 


Dec. 


January 1995 


Feb. 






30 


31 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


27 


28 


29 


30 


31 


1 


2 




A 
























7 
















' 4 
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B 
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5 




































Total 


8 


8 


8 


8 


11 


12 


12 


11 


11 


10 


10 


8 


7 


7 


7 


7 


7 


7 


8 


8 


7 


5 


6 


8 


9 


11 


11 


11 


12 


14 


14 


13 


11 


9 


4 





Explanatory notes for the table 

(a) Each horizontal bar represents an episode of disease, from beginning to end. 

(b) The length of each bar represents the duration of the disease (or duration of the worker's absence), which is 
written at the end of the bar. 

(c) Some of the employees were sick more than once in the study period. 

(d) The figures in the column totals are the number sick on each day of January. For example, on 20 January 1995, 
five employees were sick. 
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OUTLINE 13 Measurement of mortality 



Introduction to the lesson 

The measurement of the level of health of a community is usually undertaken by studying 
mortality and morbidity due to different causes. In practice, mortality statistics are easier to 
come by than morbidity data, and can provide much useful information on the disease patterns 
in the community. 

Objective of the lesson 

The objective of this lesson is to discuss the role of mortality statistics in clinical and community 
medicine, the definitions of various indices, and their uses and interpretations. 

Enabling objectives 

At the end of the lesson, the students should be able to: 

(a) Discuss the uses and limitations of mortality statistics in clinical medicine and community 
health. 

( b ) List the local and national sources of mortality data. 

(c) Describe the local and national systems of recording, reporting and registering deaths and 
fetal deaths. 

(d) Identify the shortcomings in the local and national systems of recording, reporting and reg- 
istering deaths and fetal deaths in terms of the completeness and reliability of data. 

(e) Compute the following mortality indices and explain their uses: 

• crude death rate; 

• age-specific, sex-specific and age/sex-specific death rates; 

• stillbirth rate (late fetal death rate); 

• perinatal mortality rate; 

• neonatal mortality rate; 

• post-neonatal mortality rate; 

• infant mortality rate; 

• maternal mortality rate; 

• disease/cause-specific death rate; 

• case-fatality rate; 

• specific death ratio. 

if) Describe the limitations of the crude death rate as a comparative index of total mortality. 
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(g) Explain the meaning and use of a standardized death rate. 

(h) Compute, from a given set of data and reference material, a standardized death rate (using 
both direct and indirect methods). 

(/) Explain the meaning of life expectancy. 

Required previous knowledge 

Contents of Outlines 2, 1 1 and 12. 



Lesson content 

Mortality data 

® Mortality statistics: 

— components of mortality data; 

— summaries of mortality data; 

— aggregation, tabulation and presentation of mortality data by categories 
of special importance for health monitoring (for example, sex, age, 
occupation). 

• The use of mortality and fatality rates in the following contexts: 

— comparative assessment of community health; 

— assessment of health needs of the people and fixing of priorities for action 
in terms of reducing risks of death; 

— remodelling and strengthening health services; 

— evaluation of health programmes; 

— measurement of the relative importance of specific diseases as causes of 
death; 

— estimation of the average span of life that an individual of a specified age 
is likely to attain; 

— assessment of the efficacy of a drug or procedure, in a clinical trial, against 
a disease, particularly one for which fatality is high. 

Limitations of mortality data 

Used alone, mortality data do not reveal the levels of health of a group of people 
who are alive. Detailed cause-specific mortality data are of limited use in the 
absence of reliable statistics on births, deaths and demographic aspects, prefer- 
ably by geographical and social subgroups. 

Sources of mortality data 

Sources of mortality data include: 

— vital registration system; 

— national sample surveys; 

— special health surveys; 
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— health facility records; 

— notification of infectious diseases; 

— government health institutions; 

— voluntary health institutions; 

— revenue agencies; 

— police; 

— village/community councils; 

— reports of national and international organizations; 

— census of population; 

— registration systems. 

Systems for death registration 

Refer to Outline i i. 

Definitions of mortality rates 

Handout 13.1 gives the definitions for the computation of mortality rates. All 
the rates are usually calculated for one year (but they may be calculated for any 
other specified period). A one-year period is assumed here. 

° The advantages and disadvantages of the crude death rate are that: 

— it measures the average risk of death in the population at large; 

— it is easy to compute; 

— it is often used to compare relative mortality in a given area between two 
periods that are not too far apart; 

— its level is alfected by the age and sex composition of the population; hence 
it can only be used to compare general mortality levels of two populations 
if they have a similar age/sex composition; 

— it takes no account ot the fact that the chance of dying varies according to 
age group, sex, race, occupation, etc. 

• The advantages and disadvantages of the age-specific , sex-specific and age/sex- 
specific death rate are that: 

— it measures the risk of death among persons in a specific age and/or sex 
group; 

— it is simple to calculate; 

— it can be used to compare the mortality of two populations of the same 
specific age and/or sex group, even when the age and/or sex composi- 
tions of these populations are different; 

— it gives the essential components for constructing life-tables; 

— it does not summarize total mortality in a single figure; 

— it takes no account of differences in the population structure in terms of 
race, occupation, religion, etc.; 
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— comparison of overall mortality conditions in the two populations is cum- 
bersome, because of the need to compare rates for all the different age 
groups, and for males and females. 

Meaning of life expectancy 

Life expectancy is the number of years that a person, at a certain age, is expected 
to live. The most used indices are expectation of life at birth and at 5 years of 
age. 

Standardization of death rates 

Standardized death rates are rates in which allowance has been made for the 
composition of the population. They are used to compare the mortality experi- 
ence of two or more populations of different compositions. 

Standardized death rates may be computed by: 

— direct method; 

— indirect method. 



(Mem m 

Case-fatality rate; crude death rate; death rates (age-specific, sex-specific, age/sex-specific, 
disease/cause-specific); late fetal death rate; leading causes of deaths; life-table; mortality rate 
(infant, neonatal, perinatal, post-neonatal, maternal); mortality statistics; specific death ratio (by 
age and cause); standard population; standardization of death rates; standardized mortality ratio; 
stillbirth rate and ratio; perinatal mortality rate and ratio. 



Structure of the lesson 

(a) Discuss the relative advantages and disadvantages of defining, collecting, and using mortal- 
ity data, as compared with morbidity data. Explain the relevance and uses of mortality statis- 
tics in clinical and community medicine, giving examples. 

(b) Discuss the sources of mortality data. Explain the vital registration system in the country and 
the procedures for reporting, recording and registering deaths and fetal deaths, making ref- 
erence to the actual record forms involved. Point out the possible sources of error or inaccur- 
acy, and causes of under-reporting or over-reporting; describe the steps that can be taken to 
evaluate or reduce these errors, and the role of physicians and other health workers in the 
system. 

(c) Explain the uses and definitions of the common mortality indices and their applicability and 
limitations in your country, differentiating where applicable between national practice and 
international recommendations. 

(d) Point out that, for monitoring national health development, cause-specific mortality data 
are less important than gross indicators of mortality (infant mortality, child mortality, mater- 
nal mortality, and overall mortality). 

( e ) Explain the purpose of a comparative index of total mortality, the limitations of the crude 
death rate for this purpose, and hence the need for a standardized death rate. Give examples 
of crude and standardized rates, and discuss their use and interpretation, as well as their 




TEACHING HEALTH STATISTICS 



126 



limitations. Show, with reference to worked examples, how a standardized rate may be 
computed, differentiating between the direct and the indirect methods. Emphasize the ficti- 
tious nature of standardized death rates, which do not depict the true experience of any 
population. Standardized death rates are derived purely for comparative purposes. 

(f) Discuss the general principle of standardization and its possible application to comparisons 
of other parameters besides total mortality, illustrating with examples of disease-specific 
mortality (for example, comparing cancer mortality between countries) and morbidity (for 
example, comparing incidence or prevalence of a disease between countries). Stress that a 
common standard is a prerequisite for valid comparison of standardized indices. Discuss the 
problem of selecting a standard population, and the acceptability or otherwise of a common 
standard population for all occasions and all countries. 



Lesson exercises 

The possible sources of mortality data and the degree of quality of each source should be as- 
sessed. National data should be given for calculations of mortality indices. The exercises should 
test the students' ability to use the indices to describe the health situation of the country, and to 
compare their results with those for other countries. 



■ Table 13.1 gives the mid-year population by age and sex, and the number of 
deaths which occurred in each age and sex group in 1 994, in the area covered by 
one health centre in rural Turkey. 



Table 13. 1 Age and sex distribution of mid-year popula- 
tion and deaths in a rural region of Turkey 



Age 


Population 






Deaths 




Male 


Female 


Total 


Male 


Female 


Total 


Under 1 


661 


676 


1337 


85 


80 


165 


1-4 


2310 


2216 


4 526 


19 


15 


34 


5-9 


3286 


3 205 


6491 


2 


2 


4 


10-14 


3 506 


3431 


6 937 


1 


1 


2 


15-19 


3258 


3321 


6 579 


3 


2 


5 


20-24 


2 553 


2 386 


4939 


1 


3 


4 


25-29 


2 542 


1607 


4149 


1 


1 


2 


30-34 


1686 


1381 


3 067 


0 


1 


1 


35-39 


1271 


1 145 


2416 


5 


1 


6 


40-44 


1288 


1313 


2601 


4 


3 


7 


45-49 


1354 


1291 


2 645 


3 


6 


9 


50-54 


1 152 


1024 


2176 


15 


8 


23 


55-59 


861 


784 


1645 


20 


10 


30 


60-64 


572 


494 


1066 


19 


13 


32 


65-69 


589 


521 


1 110 


31 


19 


50 


70-74 


368 


276 


644 


25 


16 


41 


75-79 


175 


141 


316 


20 


14 


34 


80-84 


92 


103 


195 


18 


16 


34 


85+ 


26 


24 


50 


14 


7 


21 


Total 


27550 


25339 


52889 


286 


218 


504 
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Details of the infant deaths are as follows: 



Age (days) at death 


Male 


Female 


Total 


0-6 


19 


13 


32 


7-27 


9 


12 


21 


28-364 


57 


55 


112 


Total 


85 


80 


165 



The total number of births occurring in this region in 1994 was 1315, with 20 
stillbirths. Table 13.2 gives the ten leading causes of death by age and sex, in the 
same region. 



Table 13.2 Ten leading causes of deaths by age and sex in 
a rural region of Turkey (same region as Table 13. 1) 



Causes of death 


Male 


Female 


Total 


Chronic bronchitis 


37 


26 


63 


Senility 


30 


18 


48 


Hypertension 


19 


29 


48 


Ischaemic heart disease 


19 


11 


30 


Pneumonia 


17 


12 


29 


Perinatal causes 


17 


10 


27 


Intestinal infections 


12 


10 


22 


Malignant neoplasm 


13 


8 


21 


Nutritional deficiency 


11 


8 


19 


Congenital anomalies 


9 


7 


16 



Evaluate the mortality situation in this health centre region by using the data 
given in Tables 1 3. 1 and 13.2, and by using the rates and ratios given in Handout 
13.1. 



■ Prepare a report to present the findings, giving your interpretation of the 
mortality data. 
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Definition of mortality rates 



All the following death rates are usually calculated for one year (but they may be calculated for any other specified 

period). 

Crude death rate (CDR): [Total number of deaths occurring in a year xlOOO]/Mid-year population. The adjective 
"crude" refers to the overall death rate with no compensation for the effect of any associated factor, such as 
age, sex or race. 

Age-specific death rate: [Total number of deaths in a specific age (or age group) in a year x1000]/Mid-year 
population of the same age (or age group) of the population. 

Sex-specific death rate: [Total number of deaths in a specific sex group in a year x 1000]/Mid-year population of 
the same sex group. 

Age/sex-specific death rate: [Total number of deaths in a specific age and sex group in a year x 1 000]/M id-year 
population of the same age and sex group. 

Stillbirth (or late fetal death) rate: [Number of stillbirths occurring in a year x1000]/Total number of births in 
the same year. 

Stillbirth (or late fetal death) ratio: [Number of stillbirths occurring in a year x 1000]/Total number of live births 
in the same year. 

Perinatal mortality rate: [(Number of stillbirths) + (number of infant deaths in the first week after birth) in a year 
x I000]/Total number of births in the same year. 

Perinatal mortality ratio: [(Number of stillbirths) + (number of infant deaths in the first week after birth) in a year 
x 10001/Total number of live births in the same year. 

Neonatal mortality rate: [Number of deaths of infants under 28 days of age in a year x 1 0001/Total number of live 
births in the same year. 

Post-neonatal mortality rate: [Number of deaths among infants aged between 28 and 364 days in a year x 1000]/ 
Total number of live births in the same year. 

Infant mortality rate: [Number of deaths under one year of age in a year x 10001/Total number of live births in the 
same year. 

Maternal mortality rate: [Number of female deaths due to complications of pregnancy, childbirth and the puer- 
perium in a year x 10001/Total number of live births in the same year. 

(In the strict sense, this is a maternal mortality ratio. Ideally the denominator should include all deliveries and 
abortions, but because of lack of data on abortions, only live births are used.) 

Disease/cause-specific death rate: [Number of deaths due to a specified disease (cause) occurring in a year 
x 10001/Mid-year population. 

Case-fatality rate: [Number of deaths due to a given disease or condition occurring in a year x 1000]/Total number 
of persons who suffered from the same disease or condition in the same year. 

Age-specific proportional death ratio: [Number of deaths at a specified age or in a specified age group (usually 
for 50+) in a year x 100]/Total number of deaths in the same year. 

Cause-specific proportional death ratio: [Number of deaths from a specified cause in a year x 1 00]/Total number 
of deaths in the same year. 
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HANDOUT 13.2 



Worked examples on the calculation of 
standardized death rates 



Table 13.3 Mid-year population by age and age-specific death rates for three towns in 1990 



Age 

group 

(years) 


Town A 


Town B 


Town C 


Population 


Death rate 
per 1000 


Population 


Death rate 
per 1000 


Population 


Death rate 
per 1000 


0-4 


9222 


27.11 


6473 


24.10 


15695 


26.00 


5-14 


19 576 


2.25 


13740 


1.97 


33316 


2.01 


15-49 


39056 


3.00 


22458 


2.89 


61514 


2.94 


50-59 


4156 


12.03 


6164 


11.52 


10320 


11.63 


60-69 


2 688 


29.76 


4165 


32.17 


6853 


30.06 


70-79 


1489 


76.56 


2795 


55.10 


4 284 


60.00 


80 + 


334 


137.72 


972 


110.08 


1306 


123.28 


Total 


76521 


9.16 


56 767 


12.58 


133288 


10.50 



As seen from Table 13.3, the crude death rates of Towns A and B are 9.1 6 and 1 2.58 per 1 000, respectively. This result 
may be due to the different age composition of the two towns. The first step should be to check if these two towns are 
different from the point of view of age composition. If they are different, then we have to calculate the standardized 
death rate. In this example, the age composition of these two towns is different. 

Direct method of standardization 

Table 13.4 demonstrates the steps to be followed in the computation of standardized death rates for the two towns, 
by the direct method. The population ofTown C will be used as the standard population. The standard population and 
death rates are taken from Table 13.3. The expected number of deaths is calculated by multiplication of the standard 
population by the death rate, for each age group. 



Table 13.4 Standard population and expected deaths: direct method of standardization 



Age group 
(years) 


Standard 

population 


Town A 


Town B 


Death rate per 
1000 


Expected 

deaths 


Death rate per 
1000 


Expected 

deaths 


0-4 


15695 


27.11 


425 


24.10 


378 


5-14 


33316 


2.25 


75 


1.97 


66 


15-49 


61 514 


3.00 


185 


2.89 


178 


50-59 


10320 


12.03 


124 


11.52 


119 


60-69 


6853 


29.76 


204 


32.17 


220 


70-79 


4 284 


76.56 


328 


55.10 


236 


80 + 


1306 


137.72 


180 


110.08 


144 


Total 


133288 




1521 




1341 
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HANDOUT 13.2 (continued) 



Calculation of the standardized death rate (SDR): 

SDR = [Total number of expected deaths x 1000]/standard population. 

For Town A it is: SDR(a) = (l 521/133288) X 1000 = 11.41. 

For Town B it is: SDR(b) = (l 341/133288) X 1000 = 10.06. 

Comments 

This result shows us that although the crude death rate (CDR) is higher in Town B than in Town A, the standardized 
death rate for Town B is lower than that for Town A. A careful study of the age compositions of these two towns will 
show that the proportion of the elderly population in Town B is higher than that in Town A, this being the reason for 
the higher crude death rate. When we standardized the populations of the two towns by age, we compared the two 
towns on the basis of similar age compositions, and the effects of age on the crude death rates were, therefore, 
cancelled. 

It may be concluded that the mortality risks in the population of Town A are higher than those in Town B. In other 
words, from the mortality conditions, it may be inferred that people of Town A have poorer health than those of 
Town B. 

Indirect method of standardization 

The death rates of Town C are taken as the standard death rates to be applied to the populations of Town A and Town 
B in Table 13. 3. The calculation process is given in Table 13.5. 



Table 13.5 Standard death rates and expected deaths: indirect method of computation 



Age group 
(years) 


Standard 

death 

rates 


Town A 


Town B 


Population 


Expected 

deaths 


Population 


Expected 

deaths 


0-4 


0.0260 


9222 


240 


6473 


168 


5-14 


0.0020 


19576 


39 


13740 


28 


15-49 


0.0029 


39056 


115 


22458 


66 


50-59 


0.0116 


4156 


48 


6164 


72 


60-69 


0.0301 


2688 


81 


4165 


125 


70-79 


0.0600 


1489 


89 


2795 


168 


80 + 


0.1233 


334 


41 


972 


120 


Total 


0.0105 


76521 


653 


56767 


747 



(a) Calculate the expected number of deaths by multiplying the standard death rate by the population of each age 
group in each town, separately. The results are shown in Table 13.5. 

(b) Calculate the index death rate by dividing the total expected number of deaths in each town by the total popula- 
tion of the same town, separately: 

Index death rate = [Total number of expected deaths x 1000]/Population. 

For Town A it is: (653/76521) X 1000 = 8.53. 

For Town B it is: (747/56767) X 1000 = 13.16. 
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HANDOUT 13.2 (continued) 



Divide the crude death rate of the standard population by the index death rate to obtain the standardizing factor. 

For Town A, the standardizing factor is: 10.50/8.53 = 1.231. 

For Town B, the standardizing factor is: 10.50/13.16 = 0.798. 

Multiply the crude death rates of Towns A and B by their standardizing factors to obtain the standardized death rates. 
For Town A, the standardized death rate is: 9.16 x 1.231 = 11.276. 

For Town B, the standardized death rate is: 12.58 X 0.798 = 10.039. 

Comments 

The crude death rates of Towns A and B are 9. 16 and 12.58 per 1000, respectively. The standardized death rates are 
11.28 for Town A and 10.04 for Town B. The indirect method of standardization gave similar results to the direct 
method for the same towns. 
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OUTLINE 14 Measurement of fertility 



Introduction to the lesson 

Planning of effective and efficient maternal and child health (MCH) and family planning serv- 
ices requires reliable information on fertility and the fertility behaviour of the community, in- 
cluding their determinants. Statistics on population characteristics and on mortality and morbidity 
are also important. Fertility level indices are essential for the monitoring of such services. 

Objective of the lesson 

The objective of this lesson is to provide the students with the skill to be able to measure levels 
of fertility and to appreciate the limitations of these measures. 

Enabling objectives 

At the end of this lesson, the students should be able to: 

(a) State the importance of fertility statistics in the planning of health services, especially in 
MCH and family planning. 

(b) Describe the sources of data, their advantages, disadvantages and limitations. 

(c) Discuss the factors affecting fertility behaviour of the community. 

(d) State and define the main indices for measuring fertility, and discuss their uses and limita- 
tions. 

(e) Describe the net reproduction rate and explain its use for the measurement of population 
replacement. 

if) State and define the main indices for measuring family planning activities. 

(g) Compute from appropriate data the following rates: 
crude birth rate; 
general fertility rate; 
age-specific fertility rate; 
total fertility rate; 
gross and net reproduction rate; 
rates of family planning activities: 

— current users' rate; 

— beginner's rate; 

— induced abortion rate; 

— open and closed birth interval. 
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Required previous knowledge 

Contents of Outlines 2, 1 1 and 13. 

Lesson content 

This lesson is concerned with the importance of fertility statistics in planning 
maternal and child health (MCH) and family planning services. 

Sources of fertility data 

The following are sources of fertility data: 

— hospital and maternity records; 

— registration of births, stillbirths, abortions (refer to Outline 13); 

— population census (refer to Outline 11); 

— ad hoc surveys. 

Indices of fertility 

The five main indices for measuring fertility are: crude birth rate, general fertil- 
ity rate, age-specific fertility rate, total fertility rate, and gross reproduction rate 
(for definitions, see Handout 14.1 ). All these rates are usually calculated for one 
year (but they may be calculated for any other specified period). 

Uses and limitations of fertility indices 

Crude birth rate : used to indicate the general magnitude of the fertility level. Limi- 
tations include its sensitivity to differences in age structure of the population, 
and the fact that its denominator includes sections of the population not able to 
give birth. 

General fertility rate: intended mainly as an index of general fertility, relating mainly 
to those at risk of giving birth. Its value is affected by the age distribution of 
women in the reproductive age group. 

Age-specific fertility rate: used to measure the reproductive performance of women 
of a given age, thus showing variation in fertility by age. Its use, to indicate 
variations in fertility level with age, should take into account the fact that the 
comparison is between different groups of women. It is not a summary index of 
overall fertility level of the whole population. 

Total fertility rate: used as a standardized index for the overall fertility level. It 
overcomes the limitations of the crude birth rate and the general fertility rate. Its 
use as an indicator of cohort fertility is limited by the fact that the experiences 
summarized refer to groups of women at different ages, that is, a synthetic co- 
hort, not a real cohort. 

Gross reproduction rate: has the same uses as the total fertility rate, but also pur- 
ports to give an indication of replacement of females in the population per gen- 
eration. It has the same limitations as the total fertility rate; in addition, it can 
distort comparisons of populations with differing sex ratios at birth. Its use as a 
replacement index does not take mortality into account. 
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Net reproduction rate: shows the size of a particular generation in relation to 
the previous generation, that is, the rate of replacement of females in the 
population per generation, according to present schedules of fertility and mor- 
tality. It has long-term implications for population growth. It is calculated by 
assuming that newborn females are subjected, throughout their lives, to the 
current observed age-specific mortality rates in the population. It is further as- 
sumed that survivors will bear children according to the current age-specific 
fertility rates. The total number of female offspring divided by the number in the 
original population is the net reproduction rate. If this rate is less than one, then 
the reproductive performance of the population is said to be below replacement 
level. 

Measuring the activities of family planning 

Useful measures of family-planning activities are the current users' rate, begin- 
ners' rate, open birth interval, closed birth interval, and induced abortion rate 
(for definitions, see Handout 14.1). 



mm 









Crude birth rate; general fertility rate; age-specific fertility rates; total fertility rate; gross reproduc- 
tion rate; net reproduction rate; family planning; current users' rate; beginners' rate; induced 
abortion rate; open birth interval; closed birth interval. 



Structure of the lesson 

(a) Explain the meaning and importance of the study of fertility in health services planning and 
population growth. 

(b) Explain the sources of fertility data, their advantages and limitations. 

(c) Explain the definitions and uses of the common fertility indices. Explain the age patterns of 
fertility and mortality in terms of cohort survival and reproduction, differentiating between 
the concepts of synthetic and real birth cohorts involved in the computation and interpreta- 
tion of the indices concerned. 

(d) Explain the importance of family planning and indices of the activities of family planning. 

Lesson exercises 

The teacher should ask the students to list the common indices of fertility and the sources of data 
for the calculation of the indices. The students should describe the factors which affect trends in 
fertility. Give national data for the calculation of indices of fertility, and compare the results with 
other countries or regions. 



■ Table 14.1 gives the mid-year population of women in the reproductive age 
group and the number of births among these women in the area covered by a 
rural health centre in Turkey in 1994. 
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Table 14. 1 Mid-year population of women in the reproductive age group and number 
of births 



Age 


Mid-year 

population 


Male 


Number of live births 
Female 


Total 


Number 

of 

stillbirths 


Total 

births 


15-19 


3 321 


109 


86 


195 


1 


196 


20-24 


2 386 


262 


262 


524 


4 


528 


25-29 


1 607 


155 


165 


320 


5 


325 


30-34 


1381 


64 


66 


130 


4 


134 


35-39 


1 145 


40 


32 


72 


2 


74 


40-44 


1313 


17 


24 


41 


3 


44 


45-49 


1291 


18 


15 


33 


1 


34 


Total 


12 444 


665 


650 


1315 


20 


1335 



Calculate the crude birth rate and all the fertility rates given in Handout 14.1. 

■ Prepare a report on the fertility level of this health centre region. 

■ Define the general fertility rate and give the estimate for your country. How 
is the rate different from that of the total fertility rate? 



H List, and give estimates ol, four measures of fertility for your country. Which 
one is the best indicator of fertility levels? 
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Definitions of new terms and concepts 



All the following fertility rates are usually calculated for one year (but they may be calculated for any other specified 

period). 

Age-specific fertility rate: [Number of live births to women of specified age or age group in a year x 10001/Mid- 
year population of the specified age or age group. 

Crude birth rate: [Number of live births in a year x 1000]/Mid-year population. 

General fertility rate: [Number of live births in a year x 1000]/Mid-year population of women of childbearing age 
(15-44 years or 1 5-49 in some countries). 

Gross reproduction rate: The total fertility rate restricted to female births only. It is the average number of 
daughters that a synthetic cohort of women would have at the end of the reproductive period, if there were no 
mortality among the women. 

Net reproduction rate: Rate of replacement of females in the population per generation, with the current 
schedules of fertility and mortality. 

Total fertility rate: Sum of all the age-specific fertility rates for each year of age from 15 to 49 years. It is the 
average number of children that a synthetic cohort of women would have at the end of the reproductive 
period, if there were no mortality among the women. 

The following rates can be calculated for any period of time, as well as for a year. 

Beginners' rate: [Number of new users of any contraceptive method in a specified period xlOO]/Total number of 
non-users. 

Current users' rate: [Number of current users of any contraceptive method in a specified period xlOO]/Total 
number of target population. This rate can be calculated specifically, by using several variables, such as method 
and education. 

Induced abortion rate: [Number of induced abortions in a specified period x100]/Total number of live births in 
this specified period. Abortions performed for medical purposes and spontaneous abortions are excluded. 

Additional definitions: 

Closed birth interval: Interval between two successive births. 

Open birth interval: Interval between last birth and the date of the study. 

Real birth cohort: A group of births occurring at the same time. 

Synthetic birth cohort: An artificial birth cohort, composed of a cross-sectional sample of the population. 
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OUTLINE 15 Population dynamics 



Introduction to the lesson 

There is a close interaction between population dynamics and medicine. The practice of medi- 
cine and the management of health development require an understanding of the current size 
and composition of the population, and of the determinants of changes in these characteristics. 
Health planners must also be able to describe such changes quantitatively. 

Objective of the lesson 

The objective of the lesson is to familiarize the students with the concept of population dynam- 
ics, to explain to them the nature of the interaction between the various factors that produce 
changes in population size and structure, and to teach them how to use appropriate indices to 
describe population change. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) State the factors determining the age and sex composition of a population and the changes 
over time. 

(b) State the determinants of changes in population size. 

(c) Make population projections using arithmetic and geometric progressions, given the appro- 
priate formulae. 

(d) State and define the indices for measuring survivorship. 

(e) Describe and interpret population pyramids. 

(/) Explain the concept of demographic transition. 

Required previous knowledge 

Contents of Outlines 2, 11, 13 and 14. 



Lesson content 

Definition and concept of population dynamics 

Population dynamics is the study of changes in population size and structure. 
The determinants of population growth are the number of births and deaths, 
and the amount of migration into and out of the area. 

Population structure or composition 

Population structure refers to the distribution of people by certain categories, 
variables, or characteristics, for example, age, sex and geographical area: 
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• the distribution by age shows the number or proportion of persons in each 
age group; 

• the sex distribution shows the number or proportion of persons of each sex; 

• the distribution by geographical area shows the number or proportion of 
persons living in each area. 

Population pyramid 

Definition: a graphic presentation of the age and sex distribution of a population 
in the shape of a pyramid. 

Uses: to study sex-specific age distribution of a population. A series of popula- 
tion pyramids over time can be used to study ageing patterns of birth cohorts. 

Construction: the basic pyramid form consists of bars, usually representing 5- 
year age groups (or other age combinations) in ascending order, from the lowest 
to the highest age group. Conventionally, the left side is used for males and the 
right side is used for females. 

Population change with time 

Various factors determine the age and sex structure of a population, and the 
changes over time. 

(a) The "cause and effect" aspect of population structure: 

• the present population structure is the effect of previous structures; for 
example, a current population with a large number of 5-10-year-olds is 
the effect of a large number of births 5-10 years ago. 

• the present composition is the cause of the future composition. 

(b) Influence of changing demographic characteristics: 

• changes in fertility levels and patterns are the important determinants 
in modifying population structure: a prolonged decrease in fertility 
level (for example, as a result of widespread contraceptive practices or 
increase in age at marriage) will have a marked effect on the structure of 
the population; 

• changes in mortality levels and patterns also modify population structure, 
but are less important than fertility. 

(c) Population structure can also be affected by unusual phenomena, such as 
wars, mass migration, or a pandemic outbreak of a killer disease. 

Indices used to describe population growth 

(a) Type of population growth: 

• natural (or reproductive) growth is the balance between births and deaths; 

• total growth is the balance between births, deaths and net migration. 

( b ) Measurement of population growth: 

• the general measurement of growth is given by intercensal percentage 
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change in size, that is, 

(P, -P 0 )/Po X 100, 

where P 0 is population size at a given census, and P, is the population size at 
a census t years later; 

• the rate of population growth is the change in size per unit time; 

• the annual rate of growth r is the relative change in population size per year; 

• there are two commonly used assumptions concerning the nature of growth, 
namely: 

the arithmetical growth model: P t = + rt) 

the geometric growth model: P t = P 0 ( 1 + r 

where P 0 = baseline population, P, = population at time t, and r = population 
rate of change. 

Simple population projections 

The expressions linking P ( with P 0 and r can be used to make projections of the 
present population to any time in the future (refer to Handout 15.2). 

Hazard of population projection 

Accurate projections of the population require a realistic estimate of r (that is, 
the change in the growth rate itself) during the projection period. 

Indices for measuring survival 

There are two indices for measuring survival: 

Expectation of life at birth is the number of years of life a newborn baby is expected 
to live under the prevailing mortality conditions in the population. 

The probability of survival from one age to another is the chance that those attain- 
ing a stated age will survive to a given higher age. 

Concept of demographic transition 

• Stable population 

• Stationary population 

• Implications of rapid population growth; time for the population to double 
(the "70 /r" rule) 



mi 



iTiBR'IvIStAHSI DT6Q.N 6 






Annual rate of population increase; arithmetic progression; change in population size; change in 
population structure (age and sex); demographic transition; expectation of life at birth; geometric 
progression; migration; natural growth; population dynamics; population "explosion"; population 
pyramid; probability of survival; rate of natural increase; rate of population growth; stable popula- 
tion; stationary population; survival rate; young-old population; zero population growth. 
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Structure of the lesson 

(a) Explain the use of the population distributions by sex and age, emphasizing their impor- 
tance in the calculation of age-specific and sex-specific rates. 

( b ) Explain the meaning and importance of the study of population dynamics, differentiating 
between changing population size and changing population structure (distribution by age 
and sex). 

(c) Illustrate and discuss past trends in population size in the country, and show how 
these have resulted from trends in the number of births and deaths, as well as from 
migration. Differentiate between external and internal migration, and between popula- 
tion trends for the country as a whole and for geographical subdivisions (for example, 
urban/rural areas). Show graphically, and by the trend in growth rates, whether the popu- 
lation size has been changing according to arithmetic or geometric progression, and exam- 
ine the reasons for this. (The use of semi-log paper to illustrate geometric growth may 
suitably be demonstrated here.) Explain the possible use of computers to draw population 
pyramids. 

(. d ) Explain the computation of the arithmetic and geometric growth rates, and show that pro- 
jections based on assumptions of arithmetic growth and of geometric growth produce differ- 
ent estimates of future population size. 

(e) Illustrate and discuss past trends in the age-sex composition of the population, using popu- 
lation pyramid charts to facilitate comparison. Explain the implications of "young" and "old" 
populations, the dependence of age structure on fertility, and how changes in fertility and 
mortality affect population trends in age composition and in size. 

(/) Remind students of the definitions and uses of the common fertility indices. Explain the age 
patterns of fertility and mortality in terms of cohort survival and reproduction, differentiat- 
ing between the concepts of synthetic and real birth cohorts involved in the computation 
and interpretation of the indices concerned. 

(g) Explain the concept of the demographic transition. Differentiate between stable and station- 
ary populations. Discuss the implications of rapid population growth ("explosion"), of popu- 
lation decline, and of zero population growth. 



Lesson exercises 

The teacher should obtain population data from the two latest national population censuses. 
Using these data, set exercises to test the students' knowledge of the concept of population 
composition and their ability to describe it using population figures and pyramids. Questions 
should elicit students' ability to calculate population growth rates and to make simple popula- 
tion projections. 



■ Obtain population data for your country similar to those presented for Tur- 
key in Outline 1 3 (Table 13.1). Construct a population pyramid and describe the 
typical features of the population structure. 



■ Using population estimates at two points in time: 

(a) Calculate the population growth rates using the arithmetic and geometric 
methods. 
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( b ) Use the rates to make a population projection for 10 years beyond the later 
date using the arithmetical model and the geometric model of population 
projection, and discuss the results of the two operations. 
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Definitions of new terms and concepts 



Annual rate of population change: Relative change in population size (increase or decrease) per annum. 

Arithmetic progression: A series of figures is said to be in arithmetic progression when the difference between any 
two adjacent figures is the same. For example, the series 3, 5, 7, 9, 1 1, 13, ... is in arithmetic progression. 
Population sizes over a period of years are in arithmetic progression if the size changes by a constant amount 
each year. 

Change in population structure (age and sex): Alteration in the age-composition and sex-composition of the 
population, as a result of births, deaths and migration. 

Demographic transition: The process by which, over a number of years, continuous changes in one or more of the 
fertility, mortality and migratory rates in the population produce alterations in the characteristics and structure 
of the population. 

Expectation of life at birth: Number of years a newborn baby is expected to live, given the prevailing mortality 
conditions. 

Geometric progression: A series of ordered numbers is said to be in geometric progression if the proportion 
between any two adjacent numbers is the same. For example, the series 3, 9, 27, 81, ... is in geometric 
progression. Population size over a period of years is said to follow a geometric pattern of growth if the 
proportional change is constant in successive years. 

Migration: Movement of people from one geographical location to another within the same country or across 
country borders. 

Natural growth: Change in population as a result of births and deaths, and excluding migration. 

Population dynamics: The study of changes in population size and structure over time. 

Population explosion: Rapid increase in population size. 

Population pyramid: A method of graphically depicting the age-sex composition of a population. 

Probability of survival: Chance that somebody alive at a particular age will still be alive at a given older age. 

Rate of natural increase: Relative change in population size brought about solely by the balance between births 
and deaths; it is obtained as the difference between the crude birth and death rates. 

Rate of population growth: Relative change in population size as a result of births, deaths and net migration. 

Stable population: A population that has been growing at a constant rate over a number of years. 

Stationary population: A population with no migration and for which the crude birth rate is equal to the crude 
death rate. 

Young and old populations: The median age is usually used as a basis for describing a population as "young" or 
"old". Populations with medians under 20 years may be described as young, those with medians of 30 years 
or over as old, and those with medians between 20 and 29 as of intermediate age. 

The proportion of elderly persons can also be used as an indicator of young or old population. On this basis, 
populations with 10% or more people aged 65 years and over may be said to be old. 

Zero population growth: Absence of growth in the population. 
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Computation of population projection 



Example 

If a country with a population of 10.4 million inhabitants has a constant annual population growth rate of 2.1%, what 
would be its population in 10 years time? 

The present population, P 0 , is 10.4; r- 2.1% = 0.021; t = 10. 

P, o is the required population ten years from now. 

Using the arithmetical model: 

P 10 = 10.4(1 +0.021 X 10)= 10.4 X 1.21 = 12.6, 

giving an estimated population in ten years time of 1 2.6 million. 

Using the geometric model: 

P 10 = 10.4(1 + 0.021) 10 = 10.4 X 1.2310 = 12.8. 

Thus the population in ten years time is estimated as 1 2.8 million. 

It is not surprising that the results differ (however slightly), as the two relationships used are based on different 
assumptions for population growth. The geometric model is usually more realistic, and is to be preferred. 
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OUTLINE 16 Indicators of levels of health 



Introduction to the lesson 

In order to evaluate the effectiveness and efficiency of a health treatment, health programme or 
health service, and the extent to which objectives and targets are being achieved, measurable 
yardsticks (indicators) are needed. Indicators are used to measure changes and to compare dif- 
ferent patient groups, treatment regimens, countries and regions, as well as different periods of 
time. 

Objective of the lesson 

The objective of this lesson is to introduce the students to the concept of health indicators, and 
their uses in health monitoring and surveillance. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain the need for monitoring health activities. 

( b ) Describe a health indicator. 

(c) State the requirements of a good indicator. 

(d) Give examples of health indicators used for monitoring health activities. 

(e) Explain the meaning, advantages and disadvantages of each indicator. 

(/) Identify the data necessary for calculating each indicator. 

Required previous knowledge 

Contents of Outlines 1, 4, 5 and 6. 



Lesson content 

The need for monitoring health activities and levels of health 

There is a need to monitor health activities and levels of health: 

• to determine the extent to which targets are being reached; 

• to assess the impact and effectiveness of health programmes; 

• to provide information for the programming and re-programming of health 



activities. 



Indicators of levels of health 

Health indicators are indices used to measure change or monitor health 
activities. 
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Expected requirements of an indicator 

The most important desirable characteristics of an indicator (see Handout 16.1) 
are: 

• feasibility; 

• reliability; 

• relevance; 

• sensitivity; 

• specificity; 

• validity. 

Types of indicators 

Types of indicators (see Handout 16.2) include: 

• health policy indicators; 

• social and economic indicators; 

• indicators of the provision of health care; 

• health status indicators. 

Proxy indicators 

Proxy indicators are used in place of more definitive (and possibly more objec- 
tive) indicators which may be more difficult to measure or compute. 

Sources of data for the various health indicators 

Data sources include: 

• vital events register; 

• population censuses; 

• routine health services records; 

• epidemiological surveillance data; 

• sample surveys; 

• disease registers. 




Completeness of coverage; feasibility; goal of a health programme; health status indicator; 
indicator; objective of a health programme; process indicator; proxy indicator; relevancy of an 
indicator; sensitivity of an indicator; specificity of an indicator; target (of a programme); validity of 
an indicator. 



Structure of the lesson 

(a) When presenting the lesson, the teacher should emphasize the need, use, interpretation, 
advantages and disadvantages of each indicator, rather than their computation. The use of 
indicators as tools for health monitoring should also be emphasized (refer to Handout 16.3). 
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(b) Explain, with examples, the different groupings of health indicators shown in Handout 16.2. 

(c) As far as possible, use real data which are relevant to the country when discussing national 
indicators. The discussion should concentrate on inter-regional comparisons and compari- 
son of indicators over time, to show trends. 



Lesson exercises 

Encourage students to draw on all the previous lessons to demonstrate how the knowledge 
could be used to monitor health care activities. The exercises for this lesson should, therefore, 
aim at helping the students consolidate their knowledge of the need for health monitoring, the 
need for health indicators and their use (including the type and source of data needed for their 
computation). 



■ List six important health indicators and briefly describe their use. 



■ List and rank five indicators of development used to assess a country. 



■ A sample of 5018 inhabitants in three counties of lganga District, Eastern 
Region of Uganda, was selected by means of a multistage cluster sampling 
procedure and all were interviewed. Half the population (50.2%) were under 
15 years of age and 4.2% were aged one year or less. The male to female ratio 
was 1:1.1. Most men above the age of 15 years were subsistence farmers and 
most women worked at home. 

A total of 38% of the people of school age had no education. The male adult 
literacy rate was 61.5%, and the female adult literacy rate was 40.2%. 

The majority of the households (82.9%) used a well or an unprotected spring as 
their main source of water. A total of 30% of households did not have a pit 
latrine. The average number of people per sleeping room was 2.4. 

Of the people interviewed, 5% had been admitted to hospital over the past one- 
year period for various conditions, including sleeping sickness and measles and 
for delivery. Of the people interviewed, 22% had been sick within one week of 
the interview. The major causes of morbidity were fever, clinical malaria, respi- 
ratory conditions and non-specific pains. 

Clinic utilization appeared to be high for inhabitants living within 10 kilo- 
metres of a health centre. Self-care was widely practised: between 42.1% and 
65% of the people had purchased drugs, or had had drugs purchased for them, 
without prescription, within the six-month period prior to the interview. 

The infant mortality rate was estimated to be between 126 and 165 per 1 000 live 
births. Of 78 deaths in children under five years of age, 47 occurred in infants 
and 17 occurred in children less than one month of age. Measles was the 
major cause of under-five mortality (38%). The birth rate estimate was 51 per 
1000 . 

Immunization coverage in the preschool population studied was poor. Only 12% 
of 999 children aged less than 5 years had BCG immunization; 5% had a clinic 
card; and DPT (diphtheria-pertussis-tetanus), polio and measles immunization 
was less than 2%. 
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The nutritional status, as measured by anthropometric parameters of weight 
and height, showed that 80% of the children had satisfactory to good nutritional 
status. The estimate of lameness due to polio was 6.6 per 1 000 children aged 1 5 
years or below. 

(a) Identify the indicators in the summary report. 

( b ) Group the indicators into the four categories described in Handout 16.2. 
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published document WHO/HST/SC1/96.8; available on request from Department of Health 
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Definitions of new terms and concepts in the 
context of health indicators 



(Health) status indicator: An indicator of the level of the health phenomena of interest. 

Example: Average annual number of cases (episodes) of diarrhoea per child under five years of age. 

Feasibility: The ability to obtain the data needed to compute the indicator. 

Example: An indicator of fetal loss may not be feasible, since not all data on fetal losses are routinely collected. 

Goal (of a health programme): The ultimate aim of a health programme. 

Example: polio eradication. 

Indicator: A variable that helps to measure changes directly or indirectly and is used to assess the extent to which 
objectives and targets are being attained. 

Example: see Handout 16.3. 

Objective (of a health programme): A measurable state a health programme is expected to be in, at a given time, 
as a result of the application of programme activities, procedures and resources. 

Example: An objective of an expanded programme of immunization could be effectively to immunize at least 90% of 
the eligible children by the end of the current 5-year national health programme. 

Process indicator: A measure of the extent, efficiency or quality of service performance. 

Example: Proportion of pneumonia cases seen who receive standard case management at health facilities. 

Proxy indicator: An indicator used in place of a direct indicator which may be more difficult to measure or compute. 

Example: School absenteeism may be used as a proxy indicator for general morbidity in school-age children. 

Relevance: The extent to which an indicator contributes to the understanding of the phenomena of interest. 

Example: The proportion of preschool children (under 5 years of age) more than 2 SD below the median height-for-age 
of the WHO/National Center for Health Statistics reference population contributes to the understanding of 
childhood moderate and severe stunting. 

Reliability: The indicator should be reproducible if measured by different people under similar circumstances. 

Example: Infant mortality is a reliable indicator of early childhood mortality in countries with comprehensive birth and 
death registration. 

Sensitivity: The degree to which an indicator reflects changes in the phenomena of interest. 

Examples: The quantity of non-expired drugs by category at a health facility is a sensitive indicator of drug supply at 
the facility. In many developing countries, outpatient attendance rates at public health facilities are a sensitive 
(proxy) indicator of the supply of drugs at those facilities. 

Specificity: The ability of an indicator to reflect changes in only the specific phenomena of interest. 

Example: The amount of drugs dispensed daily at a health facility is not a specific indicator of drug supply at the 
facility. 
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HANDOUT 16.1 (continued) 



Target (of a programme): An intermediate result towards an objective that a programme seeks to achieve. 

Example: The target of an immunization programme could be the vaccination of 95% of all the children under one 
year old, this year, according to the national immunization schedule. 

Validity: The degree to which an indicator is a true expression of the phenomena of interest. 

Example:The proportion of the national health budget spent on drugs is not a valid indicator of the existence of drugs 
in health facilities. 
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HANDOUT 16.2 



Examples of the four groups of indicators for 
monitoring progress towards achievement of 
health for a 



Health policy indicators 

• High-level political commitment to health for all. 

• Allocation of adequate resources for primary health care. 

• Level of community involvement in attaining health for all. 

Social and economic (background) indicators 

• Rate of population increase. 

• Gross national product (GNP) or gross domestic product (GDP). 

• Income distribution. 

• Work availability. 

• Adult literacy rate. 

• Adequacy of housing expressed as number of persons per room. 

Indicators of the provision of health care (process indicators) 

• Availability. 

• Physical accessibility. 

• Economic and cultural accessibility. 

• Indicators for assessing quality of care. 

• Indicators of coverage by primary health care: 

— level of health literacy; 

— availability of safe water in the home or within a short walking distance; 

— birth attendance by trained personnel; 

— availability of essential drugs throughout the year. 

Health status indicators 

• Percentage of newborn infants with birth weight of at least 2500 g. 

• Percentage of children that have a weight-for-age that corresponds to a specified norm. 

• Infant mortality rate, child mortality rate, under-5-year mortality rate. 

• Life expectancy at a given age. 
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HANDOUT 16.2 (continued) 



• Maternal mortality rate. 

• Disease-specific mortality rates. 

• Disease-specific morbidity rates. 

• Disability rates. 
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HANDOUT 16.3 



Some of the indicators for monitoring the goals 
and targets of the World Summit for Children 



Indicators of mortality 

• Infant mortality rate: the annual number of deaths of infants under one year of age per 1000 live births. 

Indicators of childhood nutrition 

• Underweight prevalence: proportion of preschool children (under 5 years of age) more than 2 SD (moderate and 
severe) or more than 3 SD (severe) below the median weight-for-age of the WHO/National Center for Health 
Statistics reference population. 

Indicators of water and sanitation 

• Proportion of the population with access to an adequate amount of safe drinking-water in 
within a convenient distance from the user's dwelling. 

• Proportion of the population with access to a sanitary facility for human excreta disposal in 
within a convenient distance from the user's dwelling. 

Indicators of disability 

• Disability type-specific prevalence: the total number of persons with disability, specifying the number having seri- 
ous difficulty in seeing, hearing or speaking, moving, learning or comprehending, or having strange or unusual 
behaviour, or other disability of duration of at least six months or of an irreversible nature, in the following age 
groups: 0-4, 5-14, 15-19 and 20 and over. 

Indicators of health and nutrition of the female child, and of pregnant and lactating women 

• Antenatal care: proportion of women attended at least once during pregnancy by trained health personnel. 

Indicators of child spacing 

• Contraception: proportion of women of childbearing age (15-49) currently using contraceptive methods (either 
modern or traditional). 

• Fertility: fertility rate of women 1 5-49 years of age. 

Indicators of immunization coverage 

• Proportion of children immunized against diphtheria, pertussis, and tetanus (DPT, 3 doses) before their first 
birthday. 

• Proportion of children immunized against measles before their first birthday. 

• Proportion of children immunized against poliomyelitis (OPV, 3 doses) before their first birthday. 

• Proportion of children immunized against tuberculosis before their first birthday. 



a dwelling or located 
a dwelling or located 
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Statistics in medicine, 
including medical records 



OUTLINE 17 Medical records and health 

facility statistics 



Introduction to the lesson 

A good medical records and health facility statistics system can contribute effectively towards 
improved medical care. It plays a prominent role in the evaluation of the quality of care and aids 
medical research significantly. 



Objective of the lesson 

The lesson aims to introduce the students to the medical record as an essential source of data, for 
statistics used to evaluate the quality of patient care and health facility utilization, and also for 
other activities of the health facility. 



Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Discuss the value of medical records to the patient, to the health facility, to the physician, for 
medical research and teaching, and as a source of data for health facility statistics. 

( b ) Explain the need for standardized medical records in a health care system. 

(c) Define the elements of a medical record and the terms used in the context of health facilities. 

(d) Draw up a set of minimum identification particulars for a medical record, with justifications 
for each element in the set. 

(e) Give examples of possible uses for the data elements in a medical record. 

(/) Explain the process of, and the need for, validation of data sources. 

(g) Explain the legal status of the medical record with regard to: 

• confidentiality of data; 

• length of time the medical record is kept after the patient's discharge. 

( h ) State, and elaborate upon, at least three limitations of the medical record as a source of 
data. 

(i) Explain, calculate and interpret the indices used for measuring the quality of service 
rendered by a health facility: 

• mortality rates; 

• health facility infection rate; 

• postoperative infection rate; 

• autopsy rate; 

• caesarean section rate. 
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(j) Explain other indices needed by administrators of the health facility: 

• bed occupancy ratio; 

• turnover interval; 

• average duration of health facility stay. 

Required previous knowledge 

The students should have had exposure to clinical medicine and possess a good understanding of 
the health care delivery system of the country. They should also have a good knowledge of 
general statistical methods and health statistics. 



Lesson content 

Definitions of health facility terms and elements of medical records 

Definitions related to health facility statistics and medical records are given in 
Handout 17.1. 

Standardization of health facility terms, statistics and medical records 

In order to have comparable statistics, internally or externally, all health facili- 
ties should use the same terminology. The use of different names for identical 
statistics is as confusing within a health facility as it is among a group of health 
facilities. A lack of uniformity in defining health facility terms may lead to 
mistaken or inequitable interpretations, comparisons and judgements. 



Identification data and their uses 

Health care institutions are complex organizations with medical, nursing, tech- 
nical, clerical and other staff caring for the patients. It is essential that the right 
treatment be given to the right person by the appropriate member of the treat- 
ment team. To help ensure this, identification data for each patient should 
include, as a minimum: 

• document reference number (when appropriate); 

• family name or surname; 

• given name (first name); 

• any other names by which known (aliases); 

• sex; 

• date of birth (day, month and year); 

• place of birth (if known); 

• home address. 

Validation of data 

Data validation is the process by which the information in a medical record is 
checked for accuracy. 

It is necessary to validate data because incorrect information in medical records 
could lead to false conclusions when it is used. The information may be incor- 
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rect, either because it was incorrectly recorded or because it was incorrectly 
coded, either by mistake or through lack of knowledge. 

Diagnoses may be validated by other health workers reviewing a random 
sample of medical records without reference to the reported final diagnoses 
and making their own diagnoses based on the data in the record. Similarly, a 
coding clerk from another institution might code the diagnoses given without 
referring to the codes already allotted. 

Confidentiality of medical data 

Patients should be assured that personal and private information given to the 
health worker will remain confidential. It is desirable, therefore, for each coun- 
try to have laws that enforce this right of the patients. 

It is recognized that, because of the complexity and the hierarchical organiza- 
tion of health care, records of information given in confidence will inevitably be 
seen by staff other than those to whom the information was given. 

If a patient's personal data need to be published in a report, the identifying infor- 
mation should be given in coded format, unless permission is granted by the 
patient to publish detailed identification data. 

Utilization of medical records and health facility data 

Limitations of medical records 

The limitations of the medical record as a source of health management data are 
as follows: 

(a) The information does not cover all episodes of illness that occur in the com- 
munity. For many illnesses, people do not seek medical care. Ambulatory 
services often have no appropriate health record system, and general practi- 
tioners in most countries have not yet been encouraged to keep patient records 
in such a way that they can be used for epidemiological purposes. 

(b) As sources of morbidity data, medical records understate the level of mor- 
bidity due to conditions that are difficult to diagnose and categorize. 

(c) The patient-related data are subject to vagaries in recall. This applies to all 
situations where a person is asked to recall previous events but particularly 
so in the health facility situation where the person providing the informa- 
tion is usually apprehensive at being in a strange environment. 

Data needed for health management 

A primary purpose of statistical data is to provide information to guide internal 
operational management. The kind of statistical data needed may vary consider- 
ably from one management to another, depending upon individual methods 
and problems. 

Regardless of the primary concerns and individual problems of health facility 
administrators, correct and complete statistical data are needed to: 

• establish administrative control over functional activities; 

• provide a basis for preparing operating budgets; 
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• render reports to governing bodies and outside agencies; 

• provide a basis for the distribution of expenses when computing the cost of 
operations; 

• provide a basis for the calculation of average income and costs per unit of 
service rendered. 

The data elements from the medical record required for national, regional and 
local records, include: 

• sex; 

• age; 

• marital status; 

• separation (final or discharge) diagnoses (ICD codes); 1 if the patient dies 
earlier than 48 hours after admission, the length of stay should also be indi- 
cated, in hours; 

• accident cause (ICD code); 1 

• operations (ICPM code); 2 

• length of stay in health facility (for inpatients). 

Other data required for administrators include: 

• health facility infections; 

• outpatient records; 

• records of other departments (sections), such as laboratories, emergency 
services, ambulance, accounting. 

Indices of quality of patient care and their uses 

Formulae for the following indices of quality of care provided by the health 
facility to the patient are given in Handout 17.2. 

(a) Mortality rates (for health facility): 

• gross death rate; 

• net death rate; 

• anaesthesia death rate; 

• postoperative death rate; 

• death rate in maternity unit; 

• infant death rate; 

• fetal death rate. 

( b ) Health facility infection rates: 

• gross infection rate; 

• net infection rate; 

• postoperative infection rate. 



1 See Outline 18. 

2 See International classification of procedures in medicine. Geneva, World Health Organization, 1978. 
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(c) Other rates: 

• autopsy rate; 

• caesarean section ratio; 

• unnecessary surgery rate. 

Indices of utilization of health facility services and their uses 

Methods of calculating the following health facility indices are given in Handout 
17.3. 

(a) Average duration of stay in health facility; 

(b) Bed occupancy ratio; 

(c) Turnover interval. 
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Admission; data elements; follow-up care; International Classification of Diseases (ICD); Interna- 
tional Classification of Procedures in Medicine (ICPM); inpatient; medical record; outpatient; pa- 
tient identification data; separation; separation diagnosis; separation summary; health facility beds; 
bed capacity; health facility deaths; patient day; length of stay; percentage of occupancy; health 
facility infections and infection rates (gross, net, postoperative); health facility death rates (gross, 
net, maternal, infant, postoperative, anaesthesia); autopsy rate; caesarean section rate; average 
length of stay; bed occupancy; turnover interval. 



Structure of the lesson 

(a) Explain the need forevaluation of health facility services from the point of view of quality of 
care of the patient, health facility utilization and other administrative activities. 

( b ) Explain the need for medical records for producing health facility statistics, and the value of 
medical records to the patient, health facility, physician, medical researcher and teacher. 

(c) Explain the role of health workers as essential contributors to, and users of, the medical 
record systems. 

(d) Discuss the need to record accurate data for each patient or reporting centre, and explain 
how checks on the accuracy of the data can be made. 

( e ) Discuss the most common problems that a medical record department may encounter. (A 
medical record officer may be invited to carry out this discussion, giving real examples from 
the medical records department.) 

(/) Discuss the legal requirements for medical records. 

(g) Discuss the indices for quality of care and utilization of health facility services, and their 
uses. 



Lesson exercises 

The teacher should obtain information on the utilization of the hospital services from the medi- 
cal records department of a hospital. The students should calculate some of the indices of quality 
of patient care and utilization of health facility services. 
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If there is a large group of students, they may be divided into small working groups for the 
purpose of the exercises. Each group may then be asked to carry out the following exercises. 



■ Collect the data needed for the calculation of the indices of quality of patient 
care and utilization of health facility services (listed above) from the medical 
records department of a hospital. 

■ From the data on bed status of the hospital, calculate: 

• bed occupancy rate; 

• turnover rate; 

• average duration of stay. 
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Definitions of new terms and concepts 



Admission: Formal acceptance, by a health facility, of a patient who is to receive medical or paramedical care while 
occupying a health facility bed. 

Data elements: Those items of information extracted for statistical purposes (for example, sex and age). 

Health facility deaths: Deaths occurring after lodging a patient in an inpatient bed. Detailed records should be 
maintained for the deaths occurring within or beyond 48 hours after lodging. 

If a patient dies earlier than 48 hours after admission, length of stay should also be indicated in hours (for 
calculation of the net death rate). 

Deaths occurring before lodging (for example, in the emergency room, the ambulance or the lift) are not 
classified as health facility deaths. Separate records must be kept for these events. 

Fetal deaths (stillbirths) should be recorded separately. 

International Classification of Procedures in Medicine (ICPM): Published by the World Health Organization, 
it classifies the procedures used in the different branches of medicine (such as surgical, radiological, laboratory 
and preventive). 

Inpatient: A patient occupying a bed in a health care institution for the purposes of receiving medical or paramedical 
treatment (that is, an admitted patient). 

Inpatient bed: A bed regularly maintained for use by inpatients who are receiving continual physician or dentist 
services and are lodged in a continuous nursing service area of the health facility. 

Inpatient bed capacity: The number of beds regularly maintained for inpatients in a health facility. 

Inpatient census: The number of inpatients occupying beds in the health facility at a given time. 

Length of stay: Number of days an inpatient has stayed in the health facility. It is computed by subtracting the 
admission date from the separation date (the admission day is counted but the separation day is not counted). 
Admission and separation on the same day is counted as one day. 

Medical record: A cumulative narrative of the history of a patient, the treatment given, final diagnosis, and continu- 
ing care following separation. 

Outpatient: A patient whose visit to a health care institution is confined to only a few hours and who is not accom- 
modated overnight. 

Patient day: The unit of measure denoting lodging facilities provided and services rendered to one inpatient between 
the census-taking hour on two successive days. 

Patient identification data: The information required for the unique data identification of an individual patient. 

Percentage of occupancy: The ratio of actual patient days to the maximum patient days as determined by bed 
capacity, during any given period of time. 

Separation: The termination of the occupation of a health facility bed by a patient either through discharge, transfer 
to another health care institution, or death. 
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HANDOUT 17.1 (continued) 



Separation diagnosis: The final diagnosis made at, or following, the patient's separation. (The diagnosis made on 
admission is usually a provisional one.) 

Separation summary: A summary written, or dictated, by a doctor about the case, setting out the essential facts 
from the record: symptoms, previous history, diagnosis, laboratory and X-ray findings, treatment given, opera- 
tions performed, information given to the patient, and further treatment arranged or prescribed. 

Transfer of inpatient: The movement of the patient from one type of accommodation to another. Transfer is not new 
admission. 

Turnover interval: The mean number of days a bed is not occupied between two admissions. 



0 
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Indices of quality of care 



Health facility mortality rates (death rates) 

Deaths occurring in the emergency room of the health facility or in the ambulance on the way to the health facility or 
anywhere before a patient is lodged in an inpatient bed are not included in the computation of health facility death 
rates. 

Health facility death rates are usually calculated for one year. Because of this, the period is not specified in the 
formulae. The term separation in the denominator includes discharges, deaths and transfers to other health institu- 
tions during the time specified in the numerator. 

Gross death rate 

The gross death rate includes all deaths occurring among inpatients. 

[Total number of health facility deaths] x 100 

Gross death rate = ; : 

Total number of separations 

This is a very rough indicator of the quality of patient care, since it does not include the exact time of the occurrence 
of the deaths. If, for example, the majority of the deaths occur in the first 48 hours after lodging of the patients in the 
clinics, not much can be said about the quality of patient care. If, however, the majority of the deaths occur after 48 
hours of admission to the clinics, then a detailed study of the deaths can provide useful information on the quality of 
patient care. Because of this disadvantage, the gross death rate is not a good indicator for measuring the level of 
death in a health facility. Net death rate is a better indicator. 



Net death rate 

The net death rate includes only the deaths occurring 48 hours after admission to a health facility. 



[Number of deaths occurring 48 hours or more after admission] x 100 

Net death rate = - ( r 

Total number of separations (minus deaths occurring within 48 hours after lodging) 



This is a good indicator of the quality of patient care in a health facility. A 48-hour period is regarded as sufficient for 
a health facility to diagnose a patient's illness and start curative measures. If this rate is higher than expected (accord- 
ing to the standards of the country), then the records of all the deaths should be studied in detail. 



Anaesthesia death rate 

Deaths occurring on the operating table and caused by anaesthetic agents (but not surgical complications) are in- 
cluded in this rate. 



Anaesthesia death rate = 



[Number of deaths due to anaesthetic agents] x 100 
Total number of anaesthetics administered 



Postoperative death rate 

Deaths occurring within the 10 days immediately following a surgical operation are included in this rate. 
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HANDOUT 17.2 (continued) 



Postoperative death rate = 



[Number of deaths within 10 days of surgical operation] x 100 
Total number of surgical operations 



Maternity-unit death rate 

Deaths occurring among inpatients due to pregnancy, delivery and postpartum complications make up the calculation 
of this rate. 



Maternity-unit death rate = 



[Number of deaths due to pregnancy, delivery and postpartum complications] x 100 
Total number of obstetric patients separated 



Infant death rate 

Only the infants who were born and who died in the health facility are included in the calculation of this rate. 



Infant death rate = 



[Number of infant deaths among infants born in the health facility] x 100 
Total number of infants separated 



Fetal death rate 

Deaths of fetuses occurring in the health facility after 20 or more weeks of gestation are included in this rate. 



Fetal death rate = 



[Number of fetal deaths occurring in the health facility after 20 or more weeks of gestation] x 100 
Total number of births in the health facility 



Health facility infection rates 

Infections which occur following clean wounds, operations or births, or develop in patients after admission to the 
health facility, are classified as nosocomial or non-nosocomial according to the health facility's guidelines. 

Health facility infection rates are normally calculated for one year. The term separations in the denominator includes 
discharges, deaths and transfers to other health institutions during the time specified in the numerator. 



Gross infection rate 

The gross infection rate includes all infections occurring after admission to the health facility. 

[Number of infections recorded] x 100 

Gross infection rate = J 

Total number of separations 

This is a very rough indicator of the quality of patient care, as it does not differentiate between health facility (nosocomial) 
and non-health facility (non-nosocomial) infections. Net infection rate is a better indicator than gross infection rate. 



Net infection rate 

The net infection rate includes health facility infections only. 



Net infection rate = 



[Number of infections attributed to the health facility] x 100 
Total number of separations 



Postoperative infection rate 

All infections among clean surgical cases before separation are included in this rate. 



0 
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HANDOUT 17.2 (continued) 



Postoperative infection rate = 



[Number of infections in clean surgical cases] x 100 
Total number of operations 



Other rates 

Autopsy rate 

The autopsy rate includes only the autopsies performed on inpatients who have died (autopsies on stillborn babies, 
patients dead on arrival and patients who die in the emergency room and cases released to legal authorities are not 
included). 



Autopsy rate = 



[Number of autopsies] x 100 
Total number of deaths 



Caesarean section rate 

This is the ratio of caesarean sections performed to total deliveries. 



[Number of caesarean sections performed x 100 

Caesarean section rate = - ; — 

Total number of deliveries 



Unnecessary surgery rate 

This is the ratio of unnecessarily performed operations to the total number of operations performed. 



Unnecessary surgery rate = 



[Number of biopsy materials reported as normal tissue by pathologist] x 100 
Total number of operations 
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HANDOUT 17.3 



Indices of utilization of health facility services 



Average duration of stay in health facility 

The average duration of stay in a health facility is the total number of inpatient days of care provided to separated 
patients (exclusive of newborn babies) in a period divided by the total number of separated patients (exclusive of 
newborn babies). In computing the length of stay, the day of admission is counted but the day of discharge is not 
counted. Admission and separation on the same day is counted as one day. The formula for computing the average 
length of stay of inpatients is: 

Average duration of _ Total number of inpatient days of care provided to separated patients 
stay in health facility Total number of separations 

In order to have more detailed information, the average length of stay should be calculated separately for each 
service, for each disease and for other variables. 

This index may be used to plan a waiting list. 

Bed occupancy ratio 

This is the ratio of occupied bed-days to the available bed-days as determined by bed capacity, during any given period 
of time. The formula is: 



Bed occupancy ratio 



[Actual number of occupied bed-days] x 100 
Available bed-days 



The index should be calculated for all groups of inpatients who are normally assigned to beds specifically maintained 
for such groups. It should also be calculated for each service separately. 

The bed occupancy ratio is used to measure the utilization of health facility beds. 



Turnover interval 

This is the mean number of days that a bed is not occupied between two admissions. The formula is: 



Turnover interval = 



Number of vacant bed-days 
Total number of separations 



The turnover interval is used to measure the demand for, or pressure on, beds. 
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OUTLINE 18 International Classification of Diseases 
(ICD) and certification of causes of death 



Introduction to the lesson 

The development of public health, especially in the fields of health management, health care 
and research, brought about the need and opportunity to collect and compare large amounts of 
data of high quality. The use of the International Classification of Diseases (ICD) as the basic 
format has contributed to the comparability between data collection systems. The ICD also pro- 
vides a basis that can be adapted for use in other fields, for example, dentistry, oncology and 
ophthalmology. 

Determination of disease-specific death rates depends on accurate completion of death certifi- 
cates. It is, therefore, essential that doctors complete the certification of death correctly. 

Objective of the lesson 

The objective of this lesson is to introduce the students to the principles of the ICD, its applica- 
tion, and the completion of death certificates. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain the purpose of classifying diseases and causes of death, and discuss the problems 
encountered in doing so. 

(b) Explain the structure of the TCD and its uses. 

(c) Discuss the problems in defining cause of death, give the definition of underlying cause of 
death, and explain its use and application. 

( d ) Correctly complete a certification of cause of death on a prescribed form, on the basis of a 
patient's medical file. 

Required previous knowledge 

Concepts of the etiology of diseases; concept of the natural history of a disease process, and of 
associated signs and symptoms; disease-specific morbidity and mortality indices. 



Lesson content 

Most of the materials in this section are taken from International Statistical Classi- 
fication of Diseases and Related Health Problems , Tenth revision (ICD- 10). Vols 1-3. 
Geneva, World Health Organization, 1992-1994. 

General principles 

A classification of diseases may be defined as a system of categories to which 
morbid entities are assigned according to some established criteria. There are 
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many possible choices for these criteria. The anatomist, for example, may desire 
a classification based on the part of the body affected, whereas the pathologist is 
primarily interested in the nature of the disease process. The public health prac- 
titioner would be more interested in the disease etiology, and the clinician in the 
particular manifestations requiring care. There are therefore many axes of clas- 
sification, and the particular axis selected will depend on the interest of the 
investigator. 

A statistical classification of disease and injury will depend, therefore, on the use 
to be made of the statistics to be compiled. Adjustments must be made to meet 
the varied requirements of vital statistics offices, hospitals of different types, 
medical services of the armed forces, social insurance organizations, sickness 
surveys, and numerous other agencies. While no single classification will fit 
all the specialized needs, it should provide a common basis of classification for 
general statistical use: that is, storage, retrieval and tabulation of data. 

A statistical classification of disease must be confined to a limited number of 
categories that encompass the entire range of morbid conditions. The categories 
should be chosen so that they will facilitate the statistical study of disease 
phenomena. 

It is the element of grouping in a statistical classification that distinguishes it 
from a nomenclature (a list or catalogue of approved names for morbid condi- 
tions), which must be extensive in order to accommodate all pathological condi- 
tions. The concepts of classification and nomenclature are, nevertheless, closely 
related in the sense that some classifications, for example in zoology, are so 
detailed that they become nomenclatures. Such classifications, however, are 
generally unsuitable for statistical analysis. 

In order to make accurate comparisons of morbidity or mortality data, specified 
for various diseases or causes of death, it is essential that a uniform classification 
is used throughout the world. Such a classification was introduced many years 
ago and is known as the International Classification of Diseases (ICD). Since its 
inception, it has been revised about once every 1 0 years; the latest revision (ICD- 
10) was adopted by the Forty-third World Health Assembly in 1989. 

The International Classification of Diseases, tenth revision (ICD-10) 1 

ICD- 10 is presented in three volumes: Volume 1 contains the main classifica- 
tion; Volume 2 provides guidance to users of the ICD; and Volume 3 contains the 
index to the classification. The previous revisions, ICD-8 and 9, had been 
presented in two volumes, comprising the main classification and alphabetical 
index. 

Volume 1: main classification 

Most of Volume 1 is taken up with the main classification, composed of the list 
of three-character categories, and the tabular list of inclusion and four-character 



1 This part of the lesson content is abstracted from International Statistical Classification of Diseases and 
Related Health Problems, Tenth revision (ICD- 10). Vol. 2: Instruction manual. Geneva, World Health 
Organization, 1993. 
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sub-categories. The "core" classification, comprising the list of three-character 
categories (Volume 1, pp. 29-104), is the mandatory level for reporting to the 
WHO mortality database and for general international comparisons. This core 
classification also lists chapter and block titles. The tabular list, giving the full 
detail of the four-character level, is divided into 21 chapters (pp. 105-1 175). 

Volume 1 also covers the morphology of neoplasms. The classification of mor- 
phology of neoplasms (pp. 1 1 77-1 204) may be used, if desired, as an additional 
code to classify the morphological type of neoplasms which, with a few excep- 
tions, are classified in Chapter II only according to behaviour and site (topogra- 
phy). The morphology codes are the same as those used in the special adaptation 
of ICD for oncology (ICD-O). 

Special tabulation lists 

Because the full four-character list of the ICD, and even the three-character list, 
are too long to be presented in every statistical table, most routine statistics use 
a tabulation list that emphasizes certain single conditions and groups others. 

Four special lists for the tabulation of mortality and one list for the tabulation of 
morbidity were adopted by the World Health Assembly for ICD- 10 in 1990. These 
lists are as follows: 

• List 1 General mortality — condensed list (103 causes); 

• List 2 General mortality — selected list (80 causes); 

• List 3 Infant and child mortality — condensed list (67 causes); 

• List 4 Infant and child mortality — selected list (51 causes); 

• Tabulation list for morbidity (298 causes). 

For mortality, the list number is used as a prefix to the item numbers to prevent 
confusion over which special tabulation list was used for coding. For example, if 
cause of death is "tetanus", coding of this cause by using different lists of mortal- 
ity is as follows: 

• If list 1 is used: 1-008; 

• If list 2 is used: 2-007; 

• If list 3 is used: 3-005; 

• If list 4 is used: 4-004. 

For the national display of mortality and morbidity data, countries are free 
to use any lists constructed from the items of the basic list, but to ensure a 
minimum of international comparability, any tabulation lists used for these pur- 
poses should contain the headings used in the mortality and morbidity lists of 
ICD-10. 

Definitions 

The definitions on pp. 1233-1238 of Volume 1 have been adopted by the World 
Health Assembly and are included to facilitate the international comparability of 
data. 
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Nomenclature regulations 

The regulations adopted by the World Health Assembly set out the formal re- 
sponsibilities of WHO Member States regarding the classification of diseases and 
causes of death, and the compilation and publication of statistics. They are set 
out on pp. 1239-i243 of Volume 1. 

Volume 2: instruction manual 

This brings together the notes on certification and classification formerly in- 
cluded in Volume 1 with a good deal of new background and instructional mat- 
ter and guidance on the use of Volume 1, on tabulations, and on planning for 
the use of ICD, which was seen as lacking in earlier revisions. It also includes the 
historical material formerly presented in the introduction to Volume 1. 

Volume 3: alphabetical index 

This presents the index itself, with an introduction and expanded instructions 
on its use. 

Chapters and coding system of 1CD-10 

ICD- 10 consists of 21 chapters and uses an alphanumeric code with a letter in 
the first position and numbers in the second, third and fourth positions. The 
fourth character follows a decimal point. Possible code numbers, therefore, range 
from AOO.O to Z99.9. 

Examples: 

° I<35 acute appendicitis; 

• K35.0 acute appendicitis with generalized peritonitis; 

• K35.1 acute appendicitis with peritoneal abscess; 

• K35.9 acute appendicitis, unspecified. 

The letter U is not used. It was left for coding the provisional assignment of new 
diseases of uncertain etiology (codes U00-U49) and for use in research (codes 
U50-U99), for example when testing an alternative sub-classification for a 
special project. 

Each letter is associated with a particular chapter, except for the letter D, which 
is used in both Chapter II, Neoplasms, and Chapter III, Diseases of blood and 
blood-forming organs and certain disorders involving the immune mechanism, 
and the letter H, which is used in both Chapter VII, Disease of the eye and adnexa 
and Chapter VIII, Diseases of the ear and mastoid process. Four chapters (Chap- 
ters I, II, XIX, and XX) use more than one letter in the first position of their 
codes. 

Chapters I to XVII relate to diseases and other morbid conditions, and Chapter 
XIX to injuries, poisoning and certain other consequences of external causes. 
The remaining chapters complete the range of subject matter nowadays included 
in diagnostic data. Chapter XVIII covers symptoms, signs and abnormal clinical 
and laboratory findings, not elsewhere classified. Chapter XX, External causes 
of morbidity and mortality, was traditionally used to classify causes of injury 
and poisoning, but, since the ninth revision, has also provided for any recorded 
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external cause of diseases and other morbid conditions. Finally, Chapter XXI, 
Factors influencing health status and contact with health services, is intended 
for the classification of data explaining the reason for contact with health care 
services of a person not currently sick, or the circumstances in which the patient 
is receiving care at that particular time or otherwise having some bearing on 
that person's care. 

The list of the chapters of ICD-10 is given in Handout 18.1. 

Blocks of categories 

The chapters are subdivided into homogeneous blocks of three-character cate- 
gories. In Chapter I, the block titles reflect two axes of classification: mode of 
transmission and broad group of infecting organisms. In Chapter II, the first axis 
is the behaviour of neoplasms. Within behaviour, the axis is mainly by site, al- 
though a few three-character categories are provided for important morphologi- 
cal types (for example, leukaemia, lymphomas, melanomas, mesotheliomas, 
Kaposi sarcoma). The range of categories is given in parentheses after each block 
title. 

Three-character categories 

Within each block, some of three-character categories are for single conditions, 
selected because of their frequency, severity or susceptibility to public health 
intervention, while others are for groups of diseases with some common charac- 
teristics. There is usually provision for "other" conditions to be classified, allow- 
ing many different but rarer conditions, as well as "unspecified" conditions, to 
be included. 

Four-character subcategories 

Although not mandatory for reporting at the international level, most of the 
three-character categories are subdivided by means of a fourth, numeric, char- 
acter after a decimal point, allowing up to ten subcategories. Where a three- 
character category is not subdivided, it is recommended that the letter X be used 
to fill the fourth position so that the codes are of a standard length for data 
processing. 

The four-character subcategories are used in whatever way is most appropriate, 
identifying, for example, different sites or varieties if the three-character cate- 
gory is for a single disease, or individual diseases if the three-character category 
is for a group of conditions. 

The fourth character .8 is generally used for "other" conditions belonging 
to the three-character category, and .9 is mostly used to convey the same 
meaning as the three-character category title, without adding any additional 
information. 

When the same fourth-character subdivisions apply to a range of three- 
character categories, they are listed once only, at the start of the range. A note at 
each of the relevant categories indicates where the details are to be found. For 
example, categories 003-006, for different types of abortion, have common 
fourth characters relating to associated complications (see Volume 1, p. 724). 
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Cause of death 

The causes of death to be entered on the medical certificate of cause of death are 
"all those diseases, morbid conditions or injuries which either resulted in or 
contributed to death and the circumstances of the accident or violence which 
produced any such injuries". This definition does not include symptoms and 
modes of dying, such as heart failure or asthenia. 

From the standpoint of prevention of deaths, it is important to break the chain 
of events or to institute the cure at some point. Statistics on the underlying 
cause of death would be the most useful for this purpose, and it is therefore 
recommended that the underlying cause, as defined below, be uniformly 
selected for primary tabulation of causes of deaths. 

Underlying cause of death 

This is defined as: (a) the disease or injury that initiated the train of morbid 
events leading directly to death, or (b) the circumstances of the accident or 
violence that produced the fatal injury. 

Certificate of cause of death 

In order to secure uniform application of the above principle, it is implicit that 
the medical certification form recommended by the World Health Assembly 
should be used (see Figure 18. f). A different certificate is needed for perinatal 
deaths (this is described in ICD-10, Volume 2, pp. 88-93). 

The medical certificate of cause of death is designed to elicit the information that 
will facilitate the selection of the underlying cause of death when two or more 
causes are jointly recorded. 



Figure 18. 1 International form of medical certificate of cause of death 



CAUSE OF DEATH 

1 

Disease or condition directly (a) 


Approximate 
interval between 
onset and death 


leading to death* 

due to (or as a consequence of) 

Antecedent causes (b) 




Morbid conditions, if any, 

giving rise to the above cause, due to (or as a consequence of) 

stating underlying 

condition last (c) 




due to (or as a consequence of) 

(d) 








II 

Other significant conditions 
contributing to the death but 


not related to the disease or 
condition causing it 




This does not mean the mode of dying, for example, heart failure or respiratory 
failure. It means the disease, injury or complication that caused death. 
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The certificate consists of two parts. In Part I, the cause leading directly to death 
is reported in line (a), and the antecedent conditions that gave rise to the cause 
reported in line (a) are entered in lines (b) and (c), the underlying cause being 
stated last in the sequence of events. No entry is necessary in lines (b) and (c) if 
the disease or condition directly leading to death, stated in line (a), completely 
describes the train of events. 

Any other significant condition that unfavourably influenced the course of 
the morbid process and thus contributed to the fatal outcome, but that was 
not related to the disease or condition directly causing death, is entered in 
Part II. 

After the words "due to (or as a consequence of)", which appear on the certifi- 
cate, should be included not only the direct cause or pathological process, but 
also indirect causes, for example where an antecedent condition has predisposed 
to the direct cause by damage to tissues or impairment of function, even after a 
long interval. 



Classification; alphanumeric codes; codes (three-digit categories, four-digit subcategories); death 
certificate; International Classification of Diseases; nomenclature; underlying cause of death; long 
list; special tabulation lists. 



Structure of the lesson 

(a) Recall the uses of disease-specific morbidity and mortality indices, dealt with in previous 
lessons, and give examples to illustrate the relative importance of various diseases as causes 
of ill health and death (for example, show a distribution of the number of deaths by cause, 
arranged in order of magnitude; discuss the distributions in different age groups). 

(b) Differentiate between nomenclature and classification of diseases, and discuss the possibility 
of classifying diseases by various axes or criteria (for example, by anatomical site, etiology, 
disease process, signs and symptoms). Point out the need for a conventional or standard 
classification in order to ensure uniformity and comparability of data. 

(c) Explain the structure of ICD and the history of its development, drawing special attention 
to the new objectives and features introduced in the sixth revision (1948), that is, com- 
bined classification for use in both morbidity and mortality analyses, different levels of 
grouping, supplementary lists for special applications, definition of underlying cause oi 
death, and introduction of a recording form to facilitate clear and uniform certification 
and correct identification of the underlying cause. Special attention should be paid to 
the rationale for selecting the underlying cause of death for the primary tabulation of 
causes of death. Describe the contents and layout of Volume I and Volume 3 of ICD- 
10 . 

(d) Explain the national and local requirements for procedures and documentation relating to 
the certification of causes of death (by physicians or other personnel), and discuss the differ- 
ences, if any, between these and the WHO recommendations. Examine the limitations with 
regard to reliability and completeness of cause-of-death certification, and their implications 
for comparability of statistics on causes of death over time and between places (for example, 
urban/rural) in national and international comparisons. (If a copy of Volume 2 of ICD- 10 is 
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available, it is recommended that the teacher read through the section on "Rules and guide- 
lines for mortality and morbidity coding", pages 30-123). 

(e) Examples of genuine documents and data should be used liberally to illustrate the problems 
of obtaining good disease-specific data from morbidity and mortality records, and to indicate 
the steps that may be taken at the different levels of data recording, collecting and processing 
to raise the quality of disease-specific morbidity and mortality data in the country. 



Lesson exercises 

Obtain copies of medical certificates and provide information on the medical history of patients 
and cause of death from the records department of the local hospital. Ask the students to com- 
plete the form and insert the ICD codes. Ask the students to explain the advantages and disad- 
vantages of using ICD codes. 



■ Given several case histories leading to death, fill out the death certificate iden- 
tifying the underlying cause. For example, an eighty-two year old white female 
who had been receiving treatment for hypertension dies through cerebral haem- 
orrhage. Use the information provided to fill out the medical certificate of death. 

■ Determine the ICD codes for the following disease conditions: ischaemic heart 
disease, malignant neoplasm of the stomach, urinary tract infection, pneumonia 
and chronic bronchitis. 



■ Provide the ICD codes for the top ten causes of morbidity in the country using 
the three-digit codes. 
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Chapters of ICD- 10 

I Certain infectious and parasitic diseases 

II Neoplasms 

III Diseases of the blood and blood-forming organs and certain disorders involving the immune 
mechanism 

IV Endocrine, nutritional and metabolic diseases 

V Mental and behavioural disorders 

VI Diseases of the nervous system 

VII Diseases of eye and adnexa 

VIII Diseases of ear and mastoid process 

IX Diseases of the circulatory system 

X Diseases of the respiratory system 

XI Diseases of the digestive system 

XII Diseases of the skin and subcutaneous tissue 

XIII Diseases of the musculoskeletal system and connective tissue 

XIV Diseases of the genitourinary system 

XV Pregnancy, childbirth and the puerperium 

XVI Certain conditions originating in the perinatal period 

XVII Congenital malformations, deformations and chromosomal abnormalities 

XVIII Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified 
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OUTLINE 19 Design of health investigations: health 

surveys and clinical trials 



Introduction to the lesson 

Often, routinely collected data from health service records do not provide a complete description 
of the current health status of the population suitable for use in health service planning. On 
such occasions, carefully planned health surveys may be used to collect additional information. 

In clinical medicine, the beneficial effects of a great majority of new treatments may not be very 
obvious. Observations made at the bedside by individual clinicians provide insufficient grounds 
for deciding whether these new treatments are effective at all, or more effective than the usual 
treatment. A well-designed scientific trial is the best way to obtain conclusive evidence. 

Students are often required, as part of their training, to participate in community health surveys 
or to undertake individual projects, and may also be involved in clinical trials after qualifying. 
Moreover, all students read medical journals, most issues of which report the results of trials. If 
the students are to be able to assess such reports critically, they must have some knowledge of 
the principles of the design, conduct and uses of such trials. 

Objective of the lesson 

The objective of this lesson is to describe the design, execution, uses and interpretation of vari- 
ous types of health surveys and clinical trials. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Describe what is meant by a health survey and explain its uses. 

(b) Describe the steps to be taken in planning a health survey. 

(c) Describe the main principles of the design of a health questionnaire. 

(d) Design a simple health questionnaire for use in a particular survey under specific 
circumstances. 

(e) Explain the need for, and uses of, clinical trials. 

(/) Distinguish between the four different: phases of clinical trials. 

{$) Distinguish between therapeutic and prophylactic trials. 

(h) Explain, with examples, why controlled trials are necessary. 

(/) State what is meant by historical controls and explain why they are unsatisfactory. 

(j) Explain the meaning of, and need for, randomization in clinical trials. 

( k ) State what is meant by single-blind and double-blind trials. 

(/) Explain what is meant by cross-over trials. 
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(m) Explain what is meant by matched controls. 

(n) State what is meant by a sequential trial. 

(o) Outline the ethical issues raised by the use of controls and placebos, and by the need to have 
patients' consent to participate in the trial. 

Required previous knowledge 

The students should have a clear understanding of the statistical principles and methods covered 
in Outlines 1-10. Failure to have attained these learning objectives may seriously limit their 
understanding of the issues covered in this lesson. Previous exposure to epidemiological prin- 
ciples would be an asset. 



Lesson content 

Health survey 

Definition of a health survey 

A health survey is a planned study to investigate the health characteristics of a 
population. The health survey is used to: 

• measure the total amount of illness in the population; 

• measure the amount of illness caused by a specified disease; 

• study the nutritional status of the population; 

• examine the utilization of existing health care facilities and the demand for 
new ones; 

• measure the distribution in the population of particular characteristics, 
for example, haemoglobin level, serum cholesterol level, breastfeeding 
practice, contraceptive practice; 

• examine the role and relationship of one or more factors in the etiology of a 
disease. 

Planning a health survey 

Step l : Preparation of a detailed written statement of the objectives of the survey 

The objectives of the survey have to be clearly stated, if possible in measurable 
terms. Each objective must be examined to ensure that it is achievable given the 
resources of the survey (time, personnel and money) and the availability of data. 
A check should be made to determine whether information on some of the 
objectives is already available. 

Step 2: Determination of the items of information required , and specification of definitions, 
criteria of classification and methods of collection 

The survey objectives determine the items of data that need to be collected. 
Only those items necessary for the survey to achieve its objectives should 
be included; the inclusion of other items, on such grounds as "it would be inter- 
esting to know. . or "it won't make any difference to ask just one more 
question . . ." should be firmly resisted. The questionnaire should not be unduly 
long. 
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The use of each item should be elaborated in terms of its intended classification, 
tabulation and analysis. Dummy tables should be drawn up where relevant, and 
precoding of classes done where possible. 

Each item should be well defined, and the criteria and procedures to be used for 
its collection laid down. Data collectors should be trained to apply these criteria 
in a uniform manner throughout the survey. 

Step 3 : Definition of the reference population on which information is to be sought 

The reference population has to be defined both physically and demographically 
(that is, its location, size, structure, etc.). A clear definition of the reference popu- 
lation is essential for the determination of the appropriate sampling procedures 
and eventual interpretation of the findings. A complete specification of the popu- 
lation to be sampled is the sampling frame. 

Step 4: Decision on whether the reference population is to be studied as a whole or in part 
(sampled) 

In making this decision, the size of the reference population has to be consid- 
ered in relation to the resources available for the study. The advantages and 
disadvantages of a sample compared with a comprehensive survey should be 
considered (see Outline 8 for the advantages and disadvantages of collecting 
information through sample). A so-called comprehensive survey may, in fact, 
turn out to be a bad sample survey because of low response rates. 

Step 5: Determination of the number of units in the population to be selected for study 
during the survey 

Once it has been decided to take a sample, the optimum sample size must be 
determined, taking into account the following considerations (see Outline 8): 

• sampling error versus non -sampling errors; 

• a sample size much larger than the optimum wastes resources; 

• a sample size much smaller than the optimum decreases the precision of the 
estimate and narrows the range of conclusions and generalizations; 

• optimum sample size depends on the prevalence or variability of the condi- 
tion being surveyed, and the desired precision. 

Step 6: Decision on how respondents will be selected from the population (sampling 
method) 

If only part of the population is to be examined, it is essential that those selected 
are a fair representation of the population. There are several scientific methods 
of selection that ensure fairness (see Outline 8). Some of these methods are 
more practical than others. It should be remembered that the best of samples 
can be ruined by low response rates. 

Step 7: Design , testing and validation of the questionnaires or forms on which observa- 
tions will be recorded 

A good questionnaire is essential to the success of the survey. Strict principles 
are involved in the design of a good questionnaire (see main principles of de- 
signing a questionnaire in this outline). It is essential that the draft forms be 
tested and validated before use in the main survey. 
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Step 8: Selection and training of interviewers 

The accuracy and reliability of the data collected depend on the interviewers. 
They should be carefully selected and properly trained in interview techniques, 
and should understand fully the prescribed definitions, criteria and methods. It 
is important to train all interviewers together to ensure uniformity of perfor- 
mance. Trials or dummy runs should be included in the training. 

Step 9: Collection of data 

The data collection process involves: 

• publicity to inform and solicit the cooperation of the population; 

• correct identification of selected sampling units (houses, respondents, 
villages); 

• transportation arrangements; 

• supervision and monitoring of the interviewers; 

• testing and checking of equipment; 

• retrieval of completed forms and preliminary editing. 

Step 10: Preparation for data analysis 

The design of forms should incorporate plans for analysis, precoded questions, 
coding lists for free texts, etc. Arrangements need to be made for data analysis 
facilities to be available: the personnel needed may include editing and data 
entry operators, key punch operators, programmers and statisticians. 

Relative advantages and disadvantages of sampling methods in health survey design 
Probability sampling: see under individual sampling method (Outline 8). 

Estimation of minimum sample size 

Certain information is required for the estimation of minimum sample size (n) 
of a health survey based on simple random sampling (see Outline 8 for details). 

Main principles of designing a questionnaire 
The questions should be: 

• relevant only to the specific objectives of the inquiry; 

• set out in suitable order (arranged such that sensitive questions are at the 
end); 

• preclassified and precoded wherever possible; 

• clear and unambiguous; 

• simple; 

• valid. 

Clinical trials 

Definition of clinical trials 

Planned experiments to compare the effectiveness of different regimens or meth- 
ods of treatment in human subjects. 
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Need for clinical trials 

• Evaluation of safety and efficacy of therapies, for which the results need to 
be: 

— non-subjective; 

— scientifically valid. 

• Opportunity to screen new drugs in a drug development programme. 

• Carefully designed trials are the only means to detect the usually small differ- 
ences between drugs or methods of treatment or the advantages of one over 
another. 

Phases of clinical trials 

The development of a drug usually undergoes four phases of experimentation: 
Phase I 

First experiments are carried out on human volunteers (usually after animal 
experimentation). The objectives of a phase I trial are to demonstrate the safety 
and non-toxicity of the drug. 

Phase II 

After success in a phase I trial, phase II trials are carried out to measure, with 
a stated precision, the effectiveness of the drug in order to decide whether 
further experimentation is warranted. In a drug development programme, 
phase II trials are used to screen rapidly drugs with a promising level of 
effectiveness. 

Phase III 

Phase III trials are comparative trials of the new agent versus established stand- 
ards. The objective is to compare the efficacy and safety of the new method with 
those of the existing standard treatment under the same set of conditions and 
simultaneously. 

Phase IV 

Phase IV trials are large-scale studies to demonstrate the efficacy and safety of a 
drug after its introduction into general practice. Phase IV trials, which are also 
known as post-marketing surveillance studies, are necessary because of the usu- 
ally limited scope of phase III trials. 

Therapeutic and prophylactic trials 

• Therapeutic trials measure the efficacy of drugs or other therapeutic 
procedures (for example: diet, bed-rest, surgery, physiotherapy, ionizing 
radiation). 

• Prophylactic trials measure the effect of preventive measures on the health 
of populations (for example, control of pollution of water supplies, immuni- 
zations, change of diet, change of smoking habits, fortification of foodstuffs, 
weight reduction, contraception). 
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Design of phase III trials 

Most of the reports of clinical trials in the general medical journals are of the 
phase III type. They introduce the new treatment to the practising health care 
giver. 

In the design of a phase III clinical trial, the following considerations should be 
paramount: 

Objectives of the trial 

The objectives of the trial should: 

• be carefully stated; 

• not be too many; 

• primarily concern which of the two drugs or methods of treatment is better. 
Need for an appropriate statistical design 

The goal of an appropriate design should be that the patients receiving the new 
treatment and those receiving the standard treatment (the controls) are as alike 
as possible in everything likely to influence the outcome, other than the treat- 
ments they receive. One of the main ways of achieving this is to use an appropri- 
ate statistical design. The design should be simple, and appropriate for the 
objectives. A brief description of the basic designs is given in Handout 19.2. 

Bias 

Bias is a distortion in the perception of the effects of a treatment or in the meas- 
urement of differences between the effects of two treatments. 

Sources of bias: 

• systematic difference between treatment groups at admission into the trial; 

• differential practice in the follow-up of treatment groups; 

• differential assessment of outcome in treatment groups; 

• differential exclusion or withdrawal of subjects from the study. 

Methods of reducing bias: 

• randomization; 

• blinding; 

• uniform handling of procedures. 

Ethical considerations (see Outline 22) 

The following issues are important to ensure ethical acceptability of a clinical 
trial: 

• safety of the drugs or methods of treatment; 

• the existence of an honest hypothesis; 

• informed consent of the participants; 

• the right of participants to withdraw from the study at any time without 
sanctions; 
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• confidentiality of information; 

• possibility to terminate the trial to prevent continued use of harmful or 



inferior treatment. 




Dummy tables; health survey; precoded data; sampling frame; self-coding record forms; survey 
questionnaire; systematic sampling; clinical, therapeutic and prophylactic trials; control group; 
cross-over trial; historical controls; matched controls; informed consent; placebo; placebo effect; 
randomization; sequential trial; single-blind and double-blind trials. 



Structure of the lesson 

More than one session may be needed to cover this topic. Give examples of published health 

surveys and clinical trials throughout the lesson. 

(a) Recall the limitations of health data available from routine sources, and the important role 
that ad hoc data collection plays in the health information system. Explain the meaning of a 
health survey, and illustrate with real examples the different types of objectives that a health 
survey might have. 

(b) Throughout the lesson, use references to actual surveys as much as possible to illustrate the 
applications of the various steps in survey planning, and to emphasize how decisions at each 
step must be made with the survey objective(s) firmly in mind. 

(c) Explain that survey planning should seek to ensure that the information yielded by the 
survey is reliable, valid for the reference population, obtained and processed in the most 
cost-effective way, and above all, answers the questions posed in the survey objectives. De- 
ficiencies in the information collected cannot be "made up for" or remedied even by the 
most sophisticated analytical methods or equipment. 

{d) As each step is dealt with, therefore, demonstrate its importance by discussing the implica- 
tions or consequences of poor or inadequate planning of that step. 

(e) Briefly explain the essential features of clinical trials, including the meaning of, and need for, 
controls. Also explain the differences between the two categories of clinical trials (therapeu- 
tic and prophylactic). 

(/) Explain that scientific testing of the efficacy of therapies is needed because of the rapid 
growth of laboratory medicine, and because the advantages of new drugs over established 
treatments are usually small. 

(9) Explain that the choice of a particular design is determined by the need to attribute observed 
differences to the test drug. Explain the different types of possible controls and treatment 
allocations, with examples. 

(h) Explain the steps which should be taken to control bias, such as randomization, use of 
single-blind or double-blind studies, and use of uniform handling procedures and calibration 
of measuring instruments. 

(0 Explain and discuss the ethical constraints in clinical trials. The discussion should consider 
whether it is ethical to have controls, or give an inert "treatment" (a placebo) to some pa- 
tients, and whether patients' consent should always be obtained. 

U) Reading the publication on which the class exercise is based is strongly advised. Teachers 
who want to use other reported clinical trials as class exercises are encouraged to do so. 




OUTLINE 19 DESIGN OF HEALTH INVESTIGATIONS: HEALTH SURVEYS AND CLINICAL TRIALS 



185 

Lesson exercises 

Exercises should test the students' knowledge of the principles of the design and conduct of 
survey and clinical trials, and their understanding of the terminology. The teacher should use 
examples of health surveys and clinical trials published in the literature, and set questions that 
test the students' knowledge of the various aspects and principles taught in class. The class may 
be divided into groups to present plans to conduct similar investigations on selected topics. 



Health surveys 

■ List five uses of a health survey. 



■ Indicate and discuss an appropriate sampling procedure for use in the follow- 
ing situations: 

• to estimate the prevalence of tinea capitis among schoolchildren in a town 
with 25 primary schools; 

• to estimate the regional distribution of patients who attended a big teaching 
hospital in a 12-month period; 

• to select 25% of patients for interview among those attending a physician's 
clinic in a single day. 



Clinical trials 

■ This class exercise is based on a clinical trial involving the use of three anti- 
malarial drugs, mefloquine, mefloquine/sulfadoxine/pyrimethamine (MSP) and 
chloroquine, by Sowunmi and Oduola, of the Department of Pharmacology and 
Therapeutics and Postgraduate Institute for Medical Research and Training, Uni- 
versity of Ibadan, Ibadan, Nigeria, published in the Transactions of the Royal Society 
of Tropical Medicine and Hygiene, 1995, 89: 303-3 0 5 . 



Objectives of the study 

• To determine the susceptibility of Plasmodium falciparum to mefloquine, 
mefloquine/sulfadoxine/pyrimethamine (MSP) and chloroquine. 

• To determine if the combined MSP had any therapeutic advantage over 
mefloquine alone. 

Design of the trial 

Source of patients 

A total of 150 children, aged between 6 months and 10 years, suffering from 

acute symptomatic uncomplicated falciparum malaria were enrolled in the study. 

Selection criteria 

• age 6 months to 10 years; 

• history of fever in 24 hours preceding presentation or pyrexia at pres- 
entation; 



falciparum asexual parasitaemia; 
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• no anti-malaria drug administered in the two weeks preceding presentation; 

• negative urine test for anti-malarial drug; 

• no other causes of fever or concomitant illness or sickle cell disease; 

• with approval of parents or guardian. 

Allocation of patients to treatment 

Table 19.1 provides clinical and laboratory data on enrolment for patients as- 
signed to each of the three treatment groups. Details of dosage regimens are 
given in Table 19.2. 



Table 19. 1 Clinical and laboratory data for patients with acute non-complicated 
falciparum malaria on enrolment in the trial a 







Treatment regimens 






Mefloquine 


MSP b 


Chloroquine 


Number of patients 


43 


36 


36 


Age (mean years) 


3.5 (SD 2.3) 


3.6 (SD 2.0) 


3.6 (SD 2.3) 


Weight (mean kg) 


19 (SD 3.6) 


18.9 (SD 2.6) 


19.6 (SD 2.8) 


Duration of fever (days) 


1-4 


1-4 


1-4 


Temperature (mean °C) 


38.9 (SD 0.9) 


38.8 (SD 1.0) 


38.7 (SD 1.1) 


Heart rate (mean per min.) 


141 (SD 13) 


138 (SD 18) 


140 (SD 18) 


Parasitaemia (per pi) 
Geometric mean 


99121 


96106 


83100 


Range 


12 132-1 231 115 


10061-1 112112 


10000-1000120 


Haematocrit (mean %) 


31 (SD 7) 


30 (SD 6) 


31 (SD 8) 



Source: Sowunmi A, Oduola AMJ. Open comparison of mefloquine, mefloquine/sulfadoxine/pyrimethamine and chloroquine 
in acute uncomplicated falciparum malaria in children. Transactions of the Royal Society of Tropical Medicine and Hy- 
giene, 1995, 89:303-305. Reproduced by permission. 

Of the 1 50 children enrolled in the study, 35 were excluded from the analysis for various reasons. 
b Mefloquine/sulfadoxine/pyrimethamine. 



Table 19.2 Randomization of patients into the three treatment groups 



Treatment group 


Dosage 


Mefloquine 


25 mg/kg of body weight as a single oral dose on day 0 


Mefloquine, sulfadoxine and 


Tablets containing: 


pyrimethamine 


125 mg of mefloquine 
250 mg of sulfadoxine 
1 2.5 mg of pyrimethamine 

given as a single oral dose on day 0 according to body weight. 

Body weight (kg) Tablets 

5-10 1 

11-20 2 

21-30 3 

31-45 4 


Chloroquine 


25 mg/kg of body weight base given orally over 3 days: 
10 mg/kg on days 0 and 1, 5 mg/kg on day 2 
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Follow-up 

Post-treatment follow-up on day 14. 

Outcome 

A total of 115 children completed the study, 43 in mefloquine group, 36 each in 

the MSP and chloroquine groups. Therapeutic responses are shown in Table 

19.3. 

• Parasite clearance time was calculated as the time from drug adminis- 
tration to the day when a patient's parasitaemia was zero for at least 
72 hours. 

• Fever clearance time was taken as the time from drug administration till the 
core temperature fell to, or below, 37.2 °C and remained so for at least 48 
hours. 

• Treatment in all groups was considered a failure if the parasite count on day 
3 was over 25% of that on day 0, or if parasitaemia cleared and then reap- 
peared within 14 hours of treatment. 



Table 19.3 Therapeutic responses of patients with acute uncomplicated falciparum 
malaria 



Treatment regimens 





Mefloquine 


MSP 9 


Chloroquine 


No. of patients 


43 


36 


36 


Fever clearance time (hours): 


mean 


49.9 (SD 12.1) 


49.6 (SD 13.1) 


56.6 (SD 17.1) 


range 


48-72 


48-72 


48-96 


Reduction of parasitaemia (%) 


67.4 (SD 31.5) b 


66.5 (SD 30. 1 ) c 


60.1 (SD 29.6) d 


Parasite clearance time (hours): 


mean 


49.4 (SD 13.8) 


46.7 (SD 8.9) 


53.6 (SD 8.6) 


range 


42-96 


42-96 


42-1 20 e 


No. with increased parasitaemia at 12 hours 


12 


10 


9 


Response (no. of patients/ 


Cured 


43 


35 


30 


RI 


0 


1 


4 


Rll 


0 


0 


2 


Rill 


0 


0 


0 


Cure rate at day 14 (%) 


100 


97.2 


83 


Cure rate at day 28 (%) 


95.3 


94.4 


75 



Source: Sowunmi A, Oduola AMJ. Open comparison of mefloquine, mefloquine/sulfadoxine/pyrimethamine and chloroquine 
in acute uncomplicated falciparum malaria in children. Transactions of the Royal Society of Tropical Medicine and Hy- 
giene, 1995, 89:303-305. Reproduced by permission. 
a mefloquine/sulfadoxine/pyrimethamine. 
b 31 patients. 
c 26 patients. 
d 27 patients. 
e 30 patients. 

f ri = parasitaemia disappears within 7 days but reappears within 14 days; Rll = decrease but no complete disappear- 
ance of parasites from peripheral blood; Rill = no pronounced change in parasitaemia 48 hours after treatment. 
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■ What are the possible justifications for each of the selection criteria for admis- 
sion to the trial? 



■ What factors would you examine to assess whether the three treatment groups 
were comparable at the beginning of the study? 



■ What measure of severity of disease might be used? 



H What other patient breakdown (distribution) would have been informative 
in Table 19.3? 



B What differences in the treatment responses were detected at day 14? 

■ Is the difference between the percentage reduction of parasitaemia at 24 hours 
between the group receiving mefloquine and the group on chloroquine statisti- 
cally significant? 



II What inferences can be made about the relative efficacy of the three treat- 
ment regimens? 
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Definitions of new terms and concepts 



Clinical trial: Comparison between two or more treatment regimens in humans to assess their relative efficacy. 

Cross-over trial: A trial in which patients act as their own controls by receiving both the treatment being assessed 
and the control treatment, in random order. 

Historical controls: Results of a standard treatment (or no treatment) extracted from clinical records or from the 
literature. 

Informed consent: Acceptance to participate in a trial having been informed of, and understood, all the trial proce- 
dures and their possible consequences. 

Placebo: An inert or dummy pharmacological or surgical treatment. 

Placebo effect: The subjective element introduced by the application of any treatment. 

Randomized controlled trial: A trial in which patients are allocated at random — by chance methods — to the 
test treatment(s) and control group(s). 

Single-blind and double-blind trial: In a single-blind trial, the patient is unaware which treatment he or she is 
receiving. In a double-blind trial, neither the patient nor the doctor assessing the response is aware which 
treatment the patient is receiving. 
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Clinical trial designs 



Parallel groups design 

Test treatment given to one group of subjects and a standard treatment to another group of subjects, both groups 
being tested simultaneously. The trial is a randomized clinical trial if the subjects are allocated to either treatment 
randomly (in the statistical sense). 

Matched control design 

Individual subjects, or groups of subjects, are first matched in pairs according to variables that are likely to influence 
the outcome of treatment (such as age, sex, weight), then the members of the pairs of individuals or groups are 
randomly assigned to the test and comparison treatments. 

Cross-over design 

Each subject receives, in succession, the test and comparison treatments, with a suitable "wash-out" period between 
the two treatments. The order in which the treatments are given is randomized so that half the subjects receive the test 
treatment first and vice versa. 

Designs with external controls 

The comparison group is not handled simultaneously with the group on the test treatment but developed outside the 
current study by, for example, using historical controls. These designs have the advantage of being cheaper than other 
designs but have the problems of: 

• changes in patient selection; 

• changes in experimental environment. 

Sequential trials 

Patients are allocated randomly to the test and comparison treatments, as they present themselves, and the results are 
analysed continuously. The trial is halted when the results reach a predetermined level of significance, or a closure 
boundary without demonstrating significance. The size of the trial is not fixed in advance. 
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OUTLINE 20 Use of computers in health sciences 



Introduction to the lesson 

The recent development in computer technology has had a notable impact on the handling of 
health statistics and provided better opportunities for their use. Computers and general-purpose 
software have made the collection, handling, analysis and storage of large amounts of data easier, 
faster and cheaper. They have also made it possible to produce tabulations, graphs and routine 
reports more efficiently. 

Objective of the lesson 

The objective of this lesson is to enable the students to appreciate the usefulness of computers in 
health sciences. 

Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Describe major components of a computer system. 

(b) Describe the importance, use and role of computers in the field of health. 

(c) Demonstrate familiarity in the use of at least one sample statistical software, say, public 
domain Epi Info, by: 

• creating a computer-based questionnaire; 

• creating a simple data set using that questionnaire; 

• making at least one analysis of that data set. 

Required previous knowledge 

A basic appreciation of the responsibilities of a health professional in providing quick and 
adequate care. Familiarity with the steps in data-based decision processes. No other previous 
computer knowledge is needed. 



Lesson content 

Parts of a computer system 

Description and identification of various parts of a computer system. 

• A computer system consisting of hardware, software and the computer 
operator (human being), the last being the most important element. 

• Hardware: the central processing unit (CPU), consisting of chips, integrated 
circuits and printed circuit boards, hard disk, keyboard, monitor, printer, 
modem, mouse, power supply cables, etc. 
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° Software (the set of machine readable instructions): 

— machine software resident on computer chips; 

— other software loaded into a computer to perform specific types of tasks, 
such as operating the computer or data management. 

Major functions of various parts (see Handout 20.1) 

® The CPU controls the execution of instructions for the computer. 

• The hard disk stores data, text, images and software. 

• Keyboard, monitor, printer, modem and mouse are input and output 
devices. 

• A software application consists of instructions to the computer in logical 
sequence for specific functions. 

General uses of a computer system 

As a data and word-processing machine: 

— performing intricate or repetitious calculations; 

— management of databases; 

— preparing and formatting text. 

As a storage and retrieval device: 

— data and text storage on hard disk or on an external storage medium; 

— retrieval of data and text from hard disk or an external storage medium. 

As a communication device: 

— through interfacing with a telephone or satellite communication. 

As a device to process graphics, images and sounds: 

— using special software and hardware for animation of films, manipulations of 
photographs and other pictures, and synthesizing music. 

As a teaching device: 

— computer-aided lessons, which can be interactive and paced, can be repeated 
as often as needed. 

Uses of computers in health sciences 

The primary role of computers in health is as a tool for data management and 
processing. Uses include: 

• Easy access to, and full exploitation of, statistical tools, which would 
otherwise be seldom used because of the complexity of the calculations 
involved. 

• Exploring the relationships of variables in data sets through use of different 
scenarios. 

° Access to the international literature and its instant selective retrieval. 
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• Quick and more effective communication facilities using technologies such 
as electronic mail (e-mail). 

• Record archiving. 

• Use as a diagnostic tool in selecting an appropriate treatment, and in assess- 
ing prognosis. 

Misuses of computers 

• Indiscriminate use of statistical methods without worrying about underlying 
assumptions, such as normality, independence and equality of variance. Most 
statistical packages do not check these assumptions. Thus, conclusions of 
dubious value can be reached if sufficient care is not exercised. 

• Over-dependency on computer-based systems without worrying about 
their limitations. For example, computer systems are sometimes used for 
decision-making rather than as tools. The ultimate decision should 
lie with the human expert and not with the machine. The machine can 
only help in providing clues and likely alternatives. The limitations of 
computer-based systems may also be ignored by the users of on-line biblio- 
graphic databases containing citations from scientific journals. Since many 
journals from developing countries and non-serial publications are not in- 
dexed, such databases cannot be considered perfect windows on the world's 
literature. 

• Being unaware that a computer does only what it is asked to do. If the in- 
structions are wrong, the results cannot be right. Thus, it is important that 
only well -tested software is used. 

Software and operating systems 

• Types of software 

— Machine software which is resident on the computer chips and devices. 

— Operating systems which interface between the machine software 
and application software. These systems regulate the functions and 
behaviour of the application software. The role is similar to that of a 
government. 

— Application software, which performs tasks of direct interest to the user. 

• The operating system selected depends on whether it is to support only one 
user per machine or to support terminals and thus allow several users to 
work simultaneously on the same CPU. 

• Salient features of operating systems include: directory structure and path; 
utilities such as format, erase, copy, autoexecute; access to disk drives, and 
printer; and editing features to write or modify a text. 

Epidemiological information processing package (Epi Info) 

• A general purpose package for handling epidemiological data, from the stage 
of data collection to data analysis and report writing. 
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• Components of Epi Info: 

— calculations of sample size for different kinds of studies; 

— text editor; 

— data entry; 

— data analysis; 

— data validation; 

— statistical calculation; 

— data transfer. 



mss 
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Algorithm; application software; ASCII; bit and byte; chip; computer; directory; drive; file; 
hard copy; hardware; language; machine software; memory; operating system; path; PC; 
program; random access memory (RAM); read only memory (ROM); secondary memory; software; 
terminal. 



Structure of the lesson 

(a) The introduction of the students to the different applications of computers is best done in 
conjunction with the lessons covering the relevant topics (for example, data organization, 
presentation and analysis). Students should, whenever possible, have hands-on experience 
with a computer. 

(b) The lesson should start with a simple but thorough description of the functional parts of a 
computer set-up: 

• hardware; 

• different types (mainframe, mini, micro, laptop, etc.); 

• input and output media; 

• software. 

(c) A demonstration of the operating system should then be given, focusing on operations that 
students will need to know in order to use a computer successfully. 

(d) An overall view of the use of computers in health sciences should be illustrated with 
actual demonstrations. A public domain software, such as Epi Info, could be used for this 
purpose. 

{e) Explain the multipurpose use of Epi Info and its capacities to handle processing of informa- 
tion relating to epidemiological investigations. First demonstrate the generation of a ques- 
tionnaire, data entry, analysis, graphics and report writing, and then assign the students to 
do this exercise. A short questionnaire of, say, just one page in length, with some skip pat- 
terns, should suffice. It should be on a topic of wide interest, such as birth weight and some 
antecedents. Let the students do at least one analysis (cross-tabulation, derivation of means, 
graphs, etc.). Writing the report may be optional, depending upon the facilities available and 
the students' interest. 
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Lesson exercises 

The exercises for this lesson should aim at helping the students to appreciate the use of com- 
puters in the health field, rather than focusing on the details of computer configuration and 
programming. Students should, therefore, be provided with exercises that help them to practise 
using computers as a tool in handling their data. 



■ Generate a short questionnaire of about a page in length, on a topic of gen- 
eral interest. 



■ Analyse data on a small number of subjects, using a general purpose software 
such as Epi Info. 
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Definitions of new terms and concepts 



Algorithm: Logical arrangement of instructions to perform a task. 

Application software: Packages of programs which perform special groups of tasks in a specific manner. These 
tasks are of direct interest to the user. 

ASCII: Acronym for American Standard Code for Information Interchange. This code is widely used to exchange data 
and text which can be read by users of different applications. This helps to translate files written in one 
language to another language. 

Bit and byte: The basic unit of a memory is a bit, and a set of 8 bits is a byte. One byte is required to store one 
character (alphabet, digit or space). One kB is 1024 bytes, roughly considered to be 1000 bytes and one MB 
is nearly one million bytes. 

Chip: A small computer device containing memory space and machine software. 

Computer: A machine that can be programmed to carry out a series of instructions. 

Directory: A place in a disk containing the list of files. A directory may contain sub-directories, each sub-directory 
containing the files. The user assigns a unique name to each directory and sub-directory. 

Drive: That part of the computer which is used to read the memory. Generally, three drives are available — called A, 
B and C. The first two are used to read floppy disks or diskettes and the last refers to the hard disk. Sometimes 
the hard disk is partitioned into several parts called C, D, E drives, etc. 

File: A collection of data records, text or program statements which are put together and given a name for reference 
or retrieval. 

Hard copy: A printed version of the inputs or the outputs. 

Hardware: Tangible parts of a computer, such as integrated circuits, printed circuit boards, ports, keyboard, monitor, 
printer, terminal, modem and mouse. 

Language: A system of words and symbols, obtained by combination of keyboard keys, which can be understood by 
the computer. Several languages are available, such as FORTRAN, COBOL, C and APL. 

Machine software: Programs embedded in electronic devices during their manufacture. They come as part of the 
machine. 

Memory: That part of a computer system where programs and data are stored. 

Operating system: Software that governs the usage of the application software. It is an interface between the 
machine software and the application software. 

Path: The route that a computer device should follow to locate a file in the memory. For example, 
C:\DIR_1\SUB_DIR_A\MYFILE means that the file called MYFILE is located in sub-directory SUB_DIR_A which 
is in the directory called DIR_1, and that this directory is available on the C-drive of the computer. 

PC: An acronym for personal computer — a type of tabletop or smaller computer. These computers may be small 
in size but can have enormous capacity, including the capability for simultaneous handling of several 
tasks. 
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HANDOUT 20.1 (continued) 



Program: Task-specific algorithm written in a language that can be deciphered by a computer. 

Random access memory: Also known as RAM, is built into a computer for temporary storage of the software, the 
intermediary steps and the results — all required to perform the task the computer is asked to do. 

Read only memory: Normally known as ROM, contains the instructions which start up when the power is turned 
on. Now also available in secondary memory (see the description below). 

Secondary memory: Under control of the computer operator, such as hard disk, floppy disk, diskette and magnetic 
tape. Compact disk read only memory (CD-ROM) comes on disks containing large amounts of data or software 
which cannot be manipulated. 

Software: Collection of interrelated programs sold as a package. These are of various types. 

Terminal: Generally, a set of monitor and keyboard, without its own central processing unit (CPU), which is con- 
nected to a computer to provide a facility for additional persons to use the same CPU. A PC can also be 
connected to another CPU and can then be used as a terminal. 
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OUTLINE 21 Rapid methods for interim assessment 



Introduction to the lesson 

Health workers at all levels need to be able to assess their activities continuously, especially 
during the implementation of priority programmes. The assessment should be focused, simple 
and rapid, for corrective action to be timely. Conventional procedures for assessing health pro- 
grammes tend to be lengthy, expensive and very often require sophisticated methodology. Other 
procedures for interim assessment have therefore been developed, which depend on less sophis- 
ticated methodology. 

Objective of the lesson 

The objective of this lesson is to introduce the students to the methods which could be used for 
rapid interim assessment of health programmes and activities. 



Enabling objectives 

At the end of the lesson the students should be able to: 

(a) Explain the concept of rapid interim assessment. 

( b ) List at least two methods which could be used rapidly to assess health activities. 

(c) Describe the strengths and weaknesses of those methods. 

Required previous knowledge 

Contents of previous lessons, particularly lessons based on Outlines 1-9, 16, 19 and 20. 



Lesson content 

Concept of rapid interim assessment 

• Rapidity: referring to the time it takes to go through the entire assessment 
process. 

• Interim: implying that the assessment is a stop-gap; definitive results need 
more rigorous methods. 

The following methods should be covered by the lesson: 

(a) The modified cluster survey methodology developed for the assessment of 
immunization coverage (EPI (Expanded Programme on Immunization) im- 
munization coverage assessment methodology). 

(b) Case-control methodology for programme assessment. 

(c) Focus-group discussions. 
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(d) Delphi techniques (knowledgeable key informants). 

{e) Geographical information systems. 

Modified cluster survey (EPI immunization coverage assessment) 

{a) General description of the methodology. 

( b ) Relationship to the classical cluster survey. 

(c) Sample selection; 

• cluster selection; 

• sampling unit selection. 

(d) Sample size: 

• precision; 

• design effect; 

• sample size estimation. 

( e ) Data analysis: 

• parameter estimations; 

• weighted analysis. 

(/) Strengths and weaknesses of the methodology. 

Case-control methods for programme assessment 

(a) General description of the methodology. 

(b) Issues concerning identification of "cases" and "controls". 

(c) Measurement of exposure. 

(d) Minimum sample size for hypothesis testing in case-control studies. 

(e) Sources of bias: 

• selection; 

• misclassification; 

• confounding. 

(/) Data analysis: 

• odds of exposure; 

• odds ratio as estimate of risk; 

• significance testing for odds ratio; 

• attributable risk. 

{g) Strengths and weaknesses of the methodology. 

Focus-group discussions 

{a) General description of the methodology. 

(b) Uses of focus-group discussions. 
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(c) Group formation: 

• homogeneity; 

• group size; 

• logistic problems. 

(d) Analysis of the discussion recordings: 

• general issues relevant to analysis of qualitative data; 

• linkage of the different recordings of the discussions (notes, audio and 
video). 

( e ) Srengths and weaknesses of the methodology. 

Delphi techniques (knowledgeable key informants) 

(a) General description of the methodology. 

( b ) Selection of knowledgeable informants. 

(c) Formulation of questions. 

(d) Data analysis: 

• general issues relevant to analysis of qualitative and subjective data; 

• consensus formation; 

• validation of responses. 

( e ) Strengths and weaknesses of the methodology. 

Geographical information systems 

(a) General description of the methodology and its application to health research. 

(b) Hardware and software for geographical information systems. 

(c) Uses of geographical information systems. 

• storage, management and integration of large spatially referenced data 
sets; 

• spatial data retrieval; 

• geographically related data analyses; 

• data mapping. 

(d) Strengths and weaknesses of the methodology. 






WEDHBB8 



Attributable risk; bias; case-control; cluster survey; confounding; controls; data mapping; expo- 
sure; misdassification bias; odds ratio; odds of exposure; rapid interim assessment; selection bias; 
spatial data. 
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Structure of the lesson 

Coverage of this topic may need more than one session. The methods do not have to be covered 

in any particular order. As many examples as possible should be used. 

(a) For each method, prepare a general description of the methodology drawing the students' 
attention to linkages with some of the topics already covered in lesson Outlines 1-9, 16, 19 
and 20. 

(b) Coverage of each method should follow the order indicated in the "Lesson content" section, 
above. 

(c) Emphasize the interim nature of these methods, explaining that for a definitive assessment 
more robust methods have to be used. 



Lesson exercises 

The teacher should obtain several examples of studies and ask students to identify the methods 
used for each study. The focus of the exercises should be on the circumstances for which each 
method can be used appropriately, and on the advantages and disadvantages of each method. 



■ A nongovernmental organization is planning a study to determine the 
sustainability of its health delivery service in a country. It is interested to ascer- 
tain the motivation and job satisfaction of the staff, and the perception of the 
community regarding the service. Which survey approach would you advise, to 
be able to collect the required information? How is this different from, or similar 
to, the conventional approach? 

■ In the immunization coverage survey described in the second exercise of Out- 
line 8, give a critical appraisal of the survey methodology. 
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Strengths and weaknesses of selected methods for 
interim assessment 



Method 


Strengths 




Weaknesses 


Modified cluster survey 


• simple to use 


• 


random selection is not used at the 


(EPI immunization coverage 


• has been shown to perform well for the 




second stage of sampling 


assessment) 


situation it was designed for 


• 


its application is limited to the 
situation it was designed for 


Case-control studies 


• timing of data collection — single round 


• 


does not allow for examination of 




• high quality of outcome data, if not based 




effect of exposure on more than one 




on recall 




outcome 




• less misclassification 


• 


not suitable for rare exposures 




• faster 

• smaller sample size 

• no ethical problems 


• 


highly susceptible to bias 


Focus-group discussions 


• provide insight into motivation, attitude, 


• 


group setting could be a hindrance to 




feelings and behaviour 




free expression 




• depending on the setting, could 


• 


errors can be easily introduced in the 




encourage free and frank expression 




discussion transcripts 




• provide group interaction 


• 


group sizes are generally small 




• responses given to please the interviewer 


• 


findings cannot be generalized 




are reduced 


• 


interpretation of the findings is prone 




• questions can be clarified on the spot 

• questions are less likely to be 
misunderstood 

• unexpected discussion avenues can be 
followed up 




to bias and subjectivity 


Delphi techniques 


• low operational cost 


• 


information needs validation 


(knowledgeable key 


• community-based and problem-oriented 


• 


results are mainly qualitative 


informants) 


• resources for subsequent intervention 


• 


findings cannot be generalized 




concentrated on high-risk communities 


• 


interpretation of the findings is prone 
to bias and subjectivity 


Geographical information 


• provide epidemiological insights into 


• 


large scale geographical information 


systems 


spatially referenced relationships 




systems are unlikely to be a rapid 




• provide health researchers with 




assessment method 



capabilities for handling spatial 
information 

have high potential for collecting, 
storing, retrieving, analysing and 
displaying data 

enable quick and effortless linkages of 
large sets of spatially referenced data 
with spatial analytical functions and 
map making 



might need costly software and 
hardware outlay for efficient 
application 
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OUTLINE 22 Statistical and medical ethics 



Introduction to the seminar 

The development of powerful new pharmaceutical products and surgical and diagnostic tech- 
niques, and the application of statistical methods to their evaluation, have brought in their wake 
many difficult ethical problems. Some of them are well recognized, some are not so obvious, 
but all need to be taken into account when a scientific investigation is planned in any field of 
medicine. 

Objective of the seminar 

The objective of the seminar is to help students to identify, give examples of, and discuss the 
ethical problems underlying the application of statistical methods to medical investigations. 

Enabling objectives 

At the end of the seminar the students should be able to: 

(a) List the questions that should be asked about the ethics of any proposed investigation on 
patients or on healthy persons. 

(b) Explain the reasons for these questions and give examples of their implications in practical 
situations. 

(c) Explain, with examples, why it is unethical to publish results that are statistically incorrect. 

(d) Explain, with examples, why it is unethical to present statistical results in a misleading way. 

(e) Explain why it can be considered unethical not to seek statistical advice at the planning stage 
of an investigation. 



Seminar content 

The following section should be used to stimulate discussion during the seminar. 

Each question that is asked should be applied to every trial under investigation. 

For reference, the questions are listed in Handout 22.1. 

Misuse of patients 

(a) Are the proposed treatments, procedures or diagnostic techniques safe? 

• Nearly all treatments, diagnostic procedures and methods of prevention 
carry some risk. One important objective of investigation is to detect and 
measure such possible risks before a new procedure is accepted for wider 
use. 

• The seriousness of this risk has to be weighed against the seriousness 
of the disease being investigated, treated or prevented. Consider, for ex- 
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ample, what would be permissible in a trial of a new treatment for the 
common cold, and compare it with what might be permissible in a trial of 
a new treatment for leukaemia. 

• Consider taking blood from veins (in adults, in children), exposure to ion- 
izing radiation, amniocentesis, cardiac catheterization. 

( b ) Is it ethical to withhold the treatment under evaluation from some patients 

(namely, the controls)? 

° If a clinician firmly believes that the new treatment under trial has clear 
advantages over the usual treatment, he or she may well believe the an- 
swer to this question to be "no", and will consequently wish not to (and 
should not) participate in the controlled trial. Another clinician may feel 
just as strongly that the patients given the new treatment under evalua- 
tion are being denied the benefits (already well accepted) of the usual 
treatment. 

° This issue raises several other questions regarding, for example, the rea- 
sons (need) for the controlled trial, the amount and reliability of evidence 
already available on both the beneficial and the harmful effects of 
the treatment under evaluation, whether it is ethical to adopt a new 
treatment for routine use without adequate systematic investigation and 
evaluation, the nature and duration of the treatment being evaluated in 
relation to the disease being treated and to other available treatments, 
and so on. 

(c) Is it ethical to bring certain persons into the trial? 

• Consider a drug trial on a population that may contain women in the very 
early stage of pregnancy (risk of malformations), or people who are likely 
to be very sensitive to side-effects (those prone to asthma or with a his- 
tory of drug allergies). 

(d) Has informed consent been obtained from all patients? 

• Consider the special problems of obtaining consent from patients who are 
mentally ill, mentally handicapped or senile, and from infants and 
children. 

(c) Is it ethical to offer inducements to people to participate in a trial? 

• Consider offers of remission of sentence to prisoners, and the use of medi- 
cal students or junior doctors as guinea-pigs when, for career reasons, 
they may think it unwise to refuse. 

(/) Is it ethical to use double-blind techniques? 

• Consider how this may interfere with desirable components of a doctor/ 
patient relationship, or with the doctor's need to adjust the dose of a drug 
in relation to the reaction to it (for example, a hypertensive patient's reac- 
tion to antihypertensive treatment). 

• In any case, the physician managing the patients must have the right and 
opportunity to break the code at any stage of the investigation if this 
becomes necessary for clinical reasons. 

(g) Is it ethical for patients to be randomly allocated to the different treatment 

and control groups? 




212 



OUTLINE 22 STATISTICAL AND MEDICAL ETHICS 



205 



(h) How far can one go with placebos and dummy treatments? Can placebo or 
sham surgery be justified? 

(/) Who should make the decision about the answers to these questions? The 
persons in charge of the investigation? Aii members of the investigation team? 
Clinical colleagues? A formal ethics committee of clinical colleagues? A for- 
mal ethics committee of non-medical people? A formal ethics committee of 
medical and non-medical people? 

Misuse of statistics 

(a) Why is it unethical to publish results that for statistical reasons are incor- 
rect? The following points should be considered: 

• False-positive results. A statistically poorly designed randomized controlled 
trial may, falsely, show a clinically important benefit from a new drug or 
diagnostic procedure. This could lead to the decision that further trials 
would be unnecessary and unethical, and to wrongful use of the drug. 

• False -negative results. An important benefit may be missed because a statis- 
tically poor investigation has produced negative results. The investigation 
may be considered too costly to repeat. This could seriously delay or pre- 
vent the adoption of the beneficial procedure. 

• Incompetent data analysis. A clinically important benefit may be concealed 
within an investigation because the results have been incompetently 
analysed. 

(b) Why is it unethical to present results in a misleading way? The following 
points should be considered: 

• use of improper scales in graphical presentations; 

• use of logarithmic scales to heighten a change of direction in a trend; 

• interpretation of measurements lying more than 2 SD from the mean value 
as always being an indication of a pathological condition; 

• misrepresentation of sampling method (for example, calling a method "ran- 
dom" when in fact it was purposeful or even haphazard); 

• failure to mention the size and effect of the non-response and drop-out 
rates; 

• misrepresenting, or failing to report, other weaknesses and limitations in 
the design and management of the investigations that could affect the 
validity of the conclusions drawn. 

(c) Why should professional statistical advice be sought at the beginning of an 
investigation? Possible explanations include: 

• to ensure that the sample size is large enough to show statistically signifi- 
cant results, taking into account both type 1 and type 2 errors; 

• to ensure that the design of the investigation and the data collected will 
permit possible sources of bias to be uncovered and allowed for, or will 
have prevented or minimized their occurrence (possible sources of 
bias include ill-defined populations, self-selection of the study subjects. 
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use of non-calibrated measuring instruments, and subjective instrument 
reading and data recording); 



• to plan ahead for the analysis of the data and the use of computers, appro- 
priate statistical methods and significance tests. 



Mode of conduct of the seminar 

The subject of the ethics of statistical investigations in medicine is suitable for presentation as a 
student-centred seminar. All students are likely to be interested in, and to have opinions on the 
subject, which lends itself well to open discussion. 

At least two weeks before the seminar, choose two students to prepare material for formal pres- 
entations, one on the misuse of patients, the other on the misuse of statistics. Give them a short 
reading list to provide background to the questions listed in Handout 22.1. Be available for 
consultation as required. 

If the seminar is planned to last one hour, a reasonable division of the time within that hour 
would be as follows. 

After a very short introduction (less than 5 minutes) by the tutor, the first student speaks for not 
more than 15 minutes on the misuse of patients. This is followed by a general discussion of 
about 10 minutes. The second student then speaks for not more than 15 minutes on the misuse 
of statistics and, as before, this is followed by a 10-minute discussion. To conclude, the tutor 
summarizes the important points that have been made. 

If an hour and a half can be arranged for the seminar, the extra time should be given to discus- 
sion rather than to the presentations. 

The students should be encouraged to use appropriate visual aids in their presentations and, 
because of the tight schedule, should have a trial run through their talks in front of the tutor 
before the seminar. 

The value of the seminar will be greatly enhanced if an experienced clinician is present to give 
an opinion on some of the more difficult clinical examples. 

The tutor should briefly make the following three points: 

(a) Ethics, in the context of this seminar, refers to moral issues involved in statistical investiga- 
tions in medicine, the scientific integrity of the investigator, and the obligations implicit in 
the doctor/patient relationship (see the Declaration of Helsinki, in Handout 22.2). 

(b) Ethical problems are inherent in the nature of medicine and doctors have always appreci- 
ated this (refer to the Hippocratic oath). 

(c) The application of scientific methods (of which statistics is an essential component) to the 
evaluation of new drugs, diagnostic procedures, surgical techniques and methods of preven- 
tion has highlighted the issues, and heightened doctors' awareness of the problems and the 
need to face them. 
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Questions to be answered during the seminar 
on statistical and medical ethics 



Misuse of patients 

( a ) Are the proposed procedures or diagnostic techniques safe? 

( b ) Is it ethical to withhold the treatment under evaluation from some patients (namely, the controls)? 

(c) Is it ethical to bring certain persons into the trial? 

(d) Has informed consent been obtained from all patients? 

(e) Is it ethical to offer inducements to people to participate in a trial? 

(f) Is it ethical to use double-blind techniques? 

(g) Is it ethical for patients to be randomly allocated to the different treatment and control groups? 

(h) How far can one go with placebos and dummy treatments? Can placebo or sham surgery be justified? 

(/) Who should make the decision about the answers to these questions? The persons in charge of the investigation? 
All members of the investigation team? Clinical colleagues? A formal ethics committee of clinical colleagues? 
A formal ethics committee of non-medical people? A formal ethics committee of medical and non-medical 
people? 

Misuse of statistics 

{a) Why is it unethical to publish results that for statistical reasons are incorrect? 

(b) Why is it unethical to present results in a misleading way? 

(c) Why should professional statistical advice be sought at the beginning of an investigation? 
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World Medical Association Declaration of Helsinki 1 



Recommendations guiding physicians in biomedical research involving human subjects 

Adopted by the 18th World Medical Assembly, Helsinki, Finland, June 1 964, and amended by the 29th World Medical 
Assembly, Tokyo, Japan, October 1975, the 35th World Medical Assembly, Venice, Italy, October 1983, the 41st World 
Medical Assembly, Hong Kong, September 1989 and the 48th General Assembly, Somerset West, Republic of South 
Africa, October 1996. 

Introduction 

It is the mission of the physician to safeguard the health of the people. His or her knowledge and conscience are 
dedicated to the fulfillment of this mission. 

The Declaration of Geneva of the World Medical Association binds the physician with the words, "The Health of my 
patient will be my first consideration," and the International Code of Medical Ethics declares that, "A physician shall 
act only in the patient's interest when providing medical care which might have the effect of weakening the physical 
and mental condition of the patient." 

The purpose of biomedical research involving human subjects must be to improve diagnostic, therapeutic and 
prophylactic procedures and the understanding of the aetiology and pathogenesis of disease. 

In current medical practice most diagnostic, therapeutic or prophylactic procedures involve hazards. This applies 
especially to biomedical research. 

Medical progress is based on research which ultimately must rest in part on experimentation involving human sub- 
jects. 

In the field of biomedical research a fundamental distinction must be recognized between medical research in which 
the aim is essentially diagnostic or therapeutic for a patient, and medical research, the essential object of which is 
purely scientific and without implying direct diagnostic or therapeutic value to the person subjected to the research. 

Special caution must be exercised in the conduct of research which may affect the environment, and the welfare of 
animals used for research must be respected. 

Because it is essential that the results of laboratory experiments be applied to human beings to further scientific 
knowledge and to help suffering humanity, the World Medical Association has prepared the following recommenda- 
tions as a guide to every physician in biomedical research involving human subjects. They should be kept under review 
in the future. It must be stressed that the standards as drafted are only a guide to physicians all over the world. 
Physicians are not relieved from criminal, civil and ethical responsibilities under the laws of their own countries. 

I. Basic principles 

1. Biomedical research involving human subjects must conform to generally accepted scientific principles and should 
be based on adequately performed laboratory and animal experimentation and on a thorough knowledge of the 
scientific literature. 



1 1996 version. Reproduced by permission of the World Medical Association. 
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2. The design and performance of each experimental procedure involving human subjects should be clearly formu- 
lated in an experimental protocol which should be transmitted for consideration, comment and guidance to a 
specially appointed committee independent of the investigator and the sponsor provided that this independent 
committee is in conformity with the laws and regulations of the country in which the research experiment is 
performed. 

3. Biomedical research involving human subjects should be conducted only by scientifically qualified persons and 
under the supervision of a clinically competent medical person. The responsibility for the human subject must 
always rest with a medically qualified person and never rest on the subject of the research, even though the 
subject has given his or her consent. 

4. Biomedical research involving human subjects cannot legitimately be carried out unless the importance of the 
objective is in proportion to the inherent risk to the subject. 

5. Every biomedical research project involving human subjects should be preceded by careful assessment of predict- 
able risks in comparison with foreseeable benefits to the subject or to others. Concern for the interests of the 
subject must always prevail over the interest of science and society. 

6. The right of the research subject to safeguard his or her integrity must always be respected. Every precaution 
should be taken to respect the privacy of the subject and to minimize the impact of the study on the subject's 
physical and mental integrity and on the personality of the subject. 

7. Physicians should abstain from engaging in research projects involving human subjects unless they are satisfied 
that the hazards involved are believed to be predictable. Physicians should cease any investigation if the hazards 
are found to outweigh the potential benefits. 

8. In publication of the results of his or her research, the physician is obliged to preserve the accuracy of the results. 
Reports of experimentation not in accordance with the principles laid down in this Declaration should not be 
accepted for publication. 

9. In any research on human beings, each potential subject must be adequately informed of the aims, methods, 
anticipated benefits and potential hazards of the study and the discomfort it may entail. He or she should be 
informed that he or she is at liberty to abstain from participation in the study and that he or she is free to 
withdraw his or her consent to participation at any time. The physician should then obtain the subject's freely- 
given informed consent, preferably in writing. 

10. When obtaining informed consent for the research project the physician should be particularly cautious if the 
subject is in a dependent relationship to him or her or may consent under duress. In that case the informed 
consent should be obtained by a physician who is not engaged in the investigation and who is completely 
independent of this official relationship. 

11. In case of legal incompetence, informed consent should be obtained from the legal guardian in accordance with 
national legislation. Where physical or mental incapacity makes it impossible to obtain informed consent, or 
when the subject is a minor, permission from the responsible relative replaces that of the subject in accordance 
with national legislation. 

Whenever the minor child is in fact able to give a consent, the minor's consent must be obtained in addition to 
the consent of the minor's legal guardian. 

1 2. The research protocol should always contain a statement of the ethical considerations involved and should indi- 
cate that the principles enunciated in the present Declaration are complied with. 



II. Medical research combined with professional care (clinical research) 

1 . In the treatment of the sick person, the physician must be free to use a new diagnostic and therapeutic measure, 
if in his or her judgment it offers hope of saving life, reestablishing health or alleviating suffering. 
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2. The potential benefits, hazards and discomfort of a new method should be weighed against the advantages of 
the best current diagnostic and therapeutic methods. 

3. In any medical study, every patient — including those of a control group, if any — should be assured of the best 
proven diagnostic and therapeutic method. This does not exclude the use of inert placebo in studies where no 
proven diagnostic or therapeutic method exists. 

4. The refusal of the patient to participate in a study must never interfere with the physician-patient relationship. 

5. If the physician considers it essential.not to obtain informed consent, the specific reasons for this proposal should 
be stated in the experimental protocol for transmission to the independent committee (I, 2). 

6. The physician can combine medical research with professional care, the objective being the acquisition of new 
medical knowledge, only to the extent that medical research is justified by its potential diagnostic or therapeutic 
value for the patient. 

III. Non-therapeutic biomedical research involving human subjects (non-dinical biomedical 

research) 

1. In the purely scientific application of medical research carried out on a human being, it is the duty of the physician 
to remain the protector of the life and health of that person on whom biomedical research is being carried out. 

2. The subjects should be volunteers — either healthy persons or patients for whom the experimental design is not 
related to the patient's illness. 

3. The investigator or the investigating team should discontinue the research if in his/her or their judgment it may, 
if continued, be harmful to the individual. 

4. In research on man, the interest of science and society should never take precedence over considerations related 
to the wellbeing of the subject. 
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SEWHMAH 

OUTLINE 23 Critique of a scientific paper 



Introduction to the seminar 

Health professionals rely largely on the literature to keep up with the developments in medicine. 
They encounter published papers of wide-ranging quality and of varying scientific integrity. 
Readers have to make their own independent judgement regarding the adequacy and reliability 
of the information given, the validity of the conclusions drawn, and the recommendations of- 
fered by the authors. Developing the ability to do so is one of the major aims of a course of 
biostatistics for health professionals. 

Objective of the seminar 

The objective of this seminar is to provide the students with an opportunity to apply the knowl- 
edge gained so far in this course to evaluate the evidence presented in a published scientific 
paper, and to illustrate the manner in which such a paper should be assessed critically. 

Enabling objectives 

At the end of the seminar the students should be able to: 

(a) List the major elements that need to be examined when making a critical assessment of a 
scientific paper. 

(b) Demonstrate some capability to deal with each of these elements, with reference to a given 
published paper or manuscript. 

Required previous knowledge 

Contents of Outlines 1-22. 



Seminar content 

The content of the seminar will essentially follow the headings and items of the 
outline in Handout 23.1, modified if necessary to suit the particular paper being 
discussed. 



Mode of conduct of the seminar 

As an introduction to the seminar, the teacher should highlight the need for this type of activity. 

The seminar should be centred around the students. The teacher should select a paper from the 
current literature, preferably of local interest, that has substantial but elementary statistical con- 
tent. The content of the paper should be understandable to the students. A group of up to 10 
students may be assigned a paper. In the case of a larger group of students, more than one paper 
would be required. 
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The guidelines given in Handout 23.1 cover some important non-statistical aspects. They too 
should be adequately discussed. 

The task of the teacher is to ensure that the students are objective, unbiased, realistic and gener- 
ally sound in their evaluation of the paper. Constructive, rather than destructive, criticism should 
be encouraged. Suggestions and recommendations made by the students should likewise be 
discussed critically, rather than simply being put forward superficially as possible options. The 
advantages and disadvantages of the different approaches suggested should be compared with 
those of the approach taken by the authors of the paper. 

It is important for the teacher to be sufficiently familiar with the contents and the details of the 
paper to be able to draw the attention of the students to those points which they miss in their 
discussion. 

The main objective of the critique is to evaluate the validity of the conclusions drawn in the 
paper. The teacher may have to intervene from time to time during the seminar in order to 
prevent the discussion from straying too far from this objective. 



References 
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Checklist for the critique of a scientific paper 



Introduction 

{a) Are the objectives clearly and precisely stated? 

(b) To which population are the results intended to be generalized? 

Methodology 

{a) Is the study an experiment, planned observations, or an analysis of records? 

(b) Does the design meet the objectives? 

(c) Is the study based on a sample? Is it a representative sample from the target population? Are the method of 
sampling and sample size adequate? 

( d ) If the study involves non-representative subjects such as volunteers, would the resulting bias affect the results? 

(e) If the study is an experiment, is there randomization and is the randomization procedure explained? 

(f) Are the definitions, methods and tools used to measure the antecedents and the outcomes appropriate, valid and 
reliable? 

(g) Does the sample size provide for possible non-response and, if it is a follow-up study, drop-outs? What would be 
the effect of such non-responses and drop-outs? 

(h) Is there any possible self-selection? 

(/) Is the overall methodology adequate for achieving the objectives of the study? 

(/) Is the methodology ethically sound? 

Results 

{a) Are the findings stated dearly and concisely, yet in sufficient detail for the readers to make their own judgement? 

(b) Are the conclusions based on facts, with the opinions clearly indicated as such? 

(c) Are all the tables and graphs needed? Are there tables and illustrations which should have been given? Is there 
any avoidable duplication? 

(d) Are there any inconsistencies between findings on two or more variables? 

(e) Are the statistical methods appropriate for the kind of variables observed, for the design adopted and for the 
number of subjects studied? 

if) Is the analysis focused on the objectives initially set out? Are coincidental findings specified? 

(g) Are the results consequent to the analysis presented? Are the confounding and intervening variables properly 
handled in the analysis? 

Discussion 

{a) Is a proper explanation given for the results obtained? 
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( b ) Is statistical significance clearly distinguished from medical significance? 

(c) Are the consistencies and the inconsistencies with present knowledge (generally accompanied by a review of the 
literature) fully explained? 

(d) Are the conclusions based on the results as presented? Are the reliability, validity and limitations of these conclu- 
sions adequately discussed? 

(e) Do these conclusions really answer the questions posed in the objectives? 

(f) Is it clearly stated how the results of the investigation advance the current knowledge on the topic? 
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ANNEX A 



Supplementary data sets 



A.1 Intra-ocular pressure of 135 adults 

A total of 135 adult factory workers were examined for intra-ocular pressure at the eye depart- 
ment of Korle-Bu Teaching Hospital, Accra, Ghana. The Perkins applanation tonometer 
was used to measure the pressure to the nearest integer. The following data record the age, sex, 
right and left eye pressure measurements, the difference between the eye measurements, and 
an assessment of the risk of glaucoma. The data are given in mmHg, but may also be expressed 
in kPa (ImmHg = 0.133kPa). (The data are reproduced by courtesy of Dr Christine Ntim- 
Amponsah.) 



Age 

(yrs) 


Sex 


Rf 


Lt b 


Diff 


Potential for 
glaucoma 


Age 


Sex 


Rt a 


Lt b 


Diff 


Potential for 
glaucoma 


24 


M 


20 


27 


-7 


high 


52 


M 


18 


12 


6 


high 


26 


M 


16 


13 


3 


low 


71 


M 


14 


14 


0 


normal 


49 


M 


13 


14 


-1 


normal 


28 


M 


17 


16 


1 


normal 


51 


M 


19 


22 


-3 


low 


41 


M 


19 


17 


2 


low 


17 


M 


14 


14 


0 


normal 


35 


M 


16 


16 


0 


normal 


52 


M 


13 


14 


-1 


normal 


31 


M 


19 


19 


0 


normal 


39 


M 


18 


18 


0 


normal 


54 


M 


14 


14 


0 


normal 


23 


M 


14 


16 


-2 


low 


35 


M 


10 


13 


-3 


low 


40 


M 


14 


13 


1 


normal 


42 


M 


16 


15 


1 


normal 


29 


M 


12 


10 


2 


low 


32 


M 


14 


12 


2 


low 


30 


M 


13 


14 


-1 


normal 


45 


M 


18 


17 


1 


normal 


56 


M 


14 


13 


1 


normal 


46 


M 


13 


14 


-1 


normal 


26 


M 


15 


14 


1 


normal 


48 


M 


15 


18 


-3 


low 


57 


M 


13 


14 


-1 


normal 


50 


M 


14 


11 


3 


low 


40 


M 


21 


17 


4 


medium 


39 


M 


9 


7 


2 


low 


39 


M 


13 


13 


0 


normal 


43 


M 


16 


12 


4 


medium 


41 


M 


11 


12 


-1 


normal 


46 


M 


20 


20 


0 


normal 


42 


M 


13 


16 


-3 


low 


35 


M 


13 


14 


-1 


normal 


41 


M 


20 


17 


3 


low 


41 


M 


13 


12 


1 


normal 


42 


M 


23 


19 


4 


medium 


26 


M 


14 


12 


2 


low 


41 


M 


9 


12 


-3 


low 


44 


M 


21 


18 


3 


low 


40 


M 


18 


19 


-1 


normal 


45 


M 


27 


20 


7 


high 


36 


M 


9 


10 


-1 


normal 


14 


M 


14 


13 


1 


normal 


52 


M 


13 


15 


-2 


low 


45 


M 


15 


15 


0 


normal 


55 


M 


19 


17 


2 


low 


49 


F 


12 


12 


0 


normal 


38 


F 


15 


19 


— 4 


medium 


42 


F 


16 


16 


0 


normal 


35 


F 


14 


11 


3 


low 


23 


F 


15 


18 


-3 


low 
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Age 

(yrs) 


Sex 


Rt a 


Lt b 


Diff 


Potential for 
glaucoma 


Age 


Sex 


Rt a 


Lt b 


Diff 


Potential for 
glaucoma 


21 


F 


12 


9 


3 


low 


40 


F 


13 


13 


0 


normal 


51 


F 


16 


17 


-1 


normal 


13 


F 


14 


17 


-3 


low 


29 


F 


16 


13 


3 


low 


40 


F 


18 


18 


0 


normal 


43 


F 


12 


12 


0 


normal 


28 


F 


16 


14 


2 


low 


38 


F 


13 


12 


1 


normal 


56 


F 


14 


13 


1 


normal 


28 


F 


13 


12 


1 


normal 


36 


F 


15 


14 


1 


normal 


22 


F 


17 


19 


-2 


low 


34 


F 


13 


14 


-1 


normal 


40 


F 


14 


14 


0 


normal 


14 


F 


16 


12 


4 


medium 


28 


F 


12 


10 


2 


low 


46 


F 


12 


12 


0 


normal 


39 


F 


18 


16 


2 


low 


47 


F 


17 


16 


1 


normal 


49 


F 


18 


18 


0 


normal 


64 


F 


16 


17 


-1 


normal 


35 


F 


15 


16 


-1 


normal 


43 


F 


11 


11 


0 


normal 


42 


F 


12 


12 


0 


normal 


46 


F 


18 


22 


-4 


medium 


41 


F 


13 


14 


-1 


normal 


31 


F 


20 


20 


0 


normal 


52 


F 


14 


12 


2 


low 


11 


F 


10 


11 


-1 


normal 


67 


F 


17 


18 


-1 


normal 


26 


F 


16 


13 


3 


low 


35 


F 


15 


9 


6 


high 


50 


F 


13 


12 


1 


normal 


25 


F 


19 


21 


-2 


low 


36 


F 


21 


18 


3 


low 


57 


F 


15 


13 


2 


low 


40 


F 


17 


17 


0 


normal 


39 


F 


16 


17 


-1 


normal 


32 


F 


12 


12 


0 


normal 


26 


F 


11 


11 


0 


normal 


26 


F 


10 


12 


-2 


low 


34 


F 


15 


10 


5 


high 


53 


F 


16 


15 


1 


normal 


39 


F 


18 


17 


1 


normal 


28 


F 


17 


15 


2 


low 


47 


F 


10 


12 


-2 


low 


41 


F 


14 


22 


-8 


high 


35 


M 


18 


11 


7 


high 


53 


M 


15 


15 


0 


normal 


36 


M 


23 


20 


3 


low 


29 


M 


6 


6 


0 


normal 


28 


M 


26 


20 


6 


high 


33 


M 


13 


16 


-3 


low 


34 


M 


17 


14 


3 


low 


43 


M 


17 


17 


0 


normal 


36 


M 


11 


10 


1 


normal 


36 


M 


12 


15 


-3 


low 


43 


M 


14 


12 


2 


low 


36 


M 


15 


13 


2 


low 


24 


M 


22 


24 


-2 


low 


42 


M 


11 


11 


0 


normal 


32 


M 


18 


17 


1 


normal 


40 


M 


16 


13 


3 


low 


43 


M 


15 


15 


0 


normal 


33 


M 


12 


12 


0 


normal 


26 


M 


12 


15 


-3 


low 


41 


M 


14 


13 


1 


normal 


46 


M 


19 


16 


3 


low 


44 


M 


11 


12 


-1 


normal 


33 


M 


20 


22 


-2 


low 


40 


M 


10 


8 


2 


low 


40 


M 


11 


11 


0 


normal 


32 


M 


17 


14 


3 


low 


42 


M 


21 


24 


-3 


low 


41 


M 


15 


13 


2 


low 


39 


M 


21 


16 


5 


high 


71 


M 


14 


12 


2 


low 


32 


F 


13 


12 


1 


normal 


38 


F 


13 


12 


1 


normal 


33 


F 


9 


8 


1 


normal 















3 Right eye (mmHg). 
b Left eye (mmHg). 

c Difference between the right and left eye measurements. 
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A.2 Relationship between breastfeeding practices and the duration of 
postpartum amenorrhoea 

In a WHO study to examine the relationship between breastfeeding practices and the duration 
of postpartum amenorrhoea, 5 50 women were recruited at the birth of their babies and the 
infant-mother pair were followed up until the end of amenorrhoea. Several variables were 
measured for each mother and her infant on admission into the study. Five of the admission 
variables for the first 1 16 subjects recruited in one of the centres in South America are shown 
below. 



Age 

(yrs) 


Social 

class" 


Smoking 6 


Alcohol' 


Ht d 


Wt e 


Age 

(yrs) 


Social 

class" 


Smoking 6 


Alcohol' 


Ht d 


Wt e 


31 


1 


No 


Yes 


167 


65 


27 


II 


No 


Yes 


156 


59 


27 


III 


No 


No 


154 


60 


31 


1 


No 


No 


145 


60 


29 


II 


Yes 


Yes 


158 


56 


29 


1 


No 


No 


160 


55 


33 


1 


No 


No 


156 


63 


27 


II 


Yes 


No 


153 


51 


23 


1 


No 


Yes 


160 


69 


22 


1 


No 


Yes 


158 


63 


24 


III 


Yes 


Yes 


162 


65 


24 


1 


No 


No 


155 


51 


29 


III 


Yes 


No 


152 


55 


31 


III 


No 


Yes 


160 


64 


25 


II 


No 


No 


160 


65 


30 


II 


Yes 


Yes 


154 


59 


26 


1 


No 


Yes 


163 


65 


23 


II 


Yes 


No 


153 


66 


37 


1 


No 


No 


152 


54 


22 


III 


No 


Yes 


158 


54 


31 


1 


No 


Yes 


150 


53 


21 


III 


No 


No 


148 


56 


20 


III 


No 


No 


158 


59 


26 


II 


Yes 


No 


167 


61 


29 


II 


Yes 


No 


164 


65 


29 


III 


No 


No 


159 


63 


29 


II 


Yes 


No 


159 


64 


28 


II 


No 


No 


159 


55 


32 


II 


No 


Yes 


155 


60 


22 


1 


No 


No 


164 


66 


24 


1 


No 


No 


158 


61 


21 


1 


Yes 


Yes 


157 


52 


28 


II 


No 


No 


151 


55 


23 


II 


Yes 


Yes 


152 


52 


20 


II 


No 


Yes 


150 


51 


28 


1 


No 


Yes 


149 


58 


29 


II 


No 


No 


154 


56 


23 


II 


No 


No 


151 


68 


25 


III 


No 


No 


154 


67 


23 


II 


Yes 


No 


160 


61 


22 


III 


No 


No 


161 


60 


22 


III 


No 


Yes 


165 


57 


27 


III 


No 


No 


153 


59 


26 


1 


Yes 


Yes 


155 


56 


28 


1 


No 


No 


155 


59 


20 


III 


Yes 


No 


150 


51 


21 


III 


Yes 


No 


147 


51 


28 


II 


No 


No 


155 


61 


27 


II 


No 


No 


154 


64 


25 


1 


Yes 


Yes 


165 


65 


27 


1 


No 


No 


159 


64 


22 


III 


Yes 


No 


157 


60 


25 


III 


No 


No 


151 


54 


23 


II 


No 


No 


157 


63 


25 


1 


Yes 


Yes 


157 


59 


25 


II 


No 


Yes 


147 


54 


30 


1 


No 


Yes 


160 


60 


28 


1 


No 


Yes 


165 


72 


28 


III 


No 


No 


152 


56 


22 


1! 


No 


No 


152 


52 


20 


1 


No 


No 


154 


53 


23 


Ill 


No 


Yes 


152 


55 


22 


1 


No 


No 


152 


57 


24 


II 


No 


No 


152 


56 


29 


III 


No 


No 


156 


64 


30 


III 


Yes 


Yes 


158 


62 
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Age 

(yrs) 


Social 

class 3 


Smoking b 


Alcohol' 


Ht d 


Wt e 


Age 

(yrs) 


Social 

class 3 


Smoking b 


Alcohol' 


Ht d 


Wt e 


22 


III 


No 


No 


145 


57 


26 


III 


No 


Yes 


142 


50 


24 


II 


No 


No 


148 


49 


32 


II 


No 


No 


156 


68 


22 


III 


Yes 


Yes 


160 


63 


28 


1 


Yes 


No 


160 


56 


30 


II 


No 


Yes 


164 


65 


22 


III 


No 


No 


155 


57 


35 


1 


No 


No 


163 


64 


24 


II 


Yes 


No 


155 


69 


25 


1 


No 


No 


157 


61 


25 


1 


No 


Yes 


159 


61 


26 


II 


No 


No 


160 


63 


22 


1 


Yes 


Yes 


162 


73 


21 


1 


No 


Yes 


158 


66 


29 


II 


No 


No 


155 


69 


23 


II 


No 


No 


154 


57 


25 


II 


No 


Yes 


163 


64 


28 


II 


No 


Yes 


151 


58 


26 


II 


Yes 


No 


148 


53 


29 


II 


No 


Yes 


162 


65 


21 


III 


Yes 


No 


154 


66 


27 


III 


No 


No 


139 


52 


29 


1 


No 


No 


149 


48 


27 


1 


No 


No 


155 


58 


27 


1 


No 


Yes 


153 


59 


27 


III 


No 


No 


164 


59 


26 


II 


No 


No 


148 


56 


26 


II 


No 


Yes 


153 


59 


26 


1 


No 


No 


154 


54 


20 


III 


No 


No 


148 


58 


29 


II 


No 


No 


158 


60 


32 


III 


No 


No 


152 


60 


32 


1 


Yes 


No 


157 


61 


28 


1 


Yes 


Yes 


150 


61 


30 


1 


Yes 


No 


154 


59 


30 


1 


No 


Yes 


148 


48 


26 


1 


No 


Yes 


158 


60 


27 


1 


No 


No 


154 


60 


26 


III 


Yes 


No 


150 


54 


37 


II 


No 


No 


155 


61 


29 


1 


No 


Yes 


150 


60 


28 


II 


Yes 


No 


155 


58 


27 


1 


No 


No 


158 


59 


26 


1 


No 


No 


160 


57 


24 


III 


No 


No 


147 


60 


35 


1 


No 


No 


151 


60 


20 


II 


No 


Yes 


150 


51 


26 


II 


No 


No 


156 


63 


23 


II 


No 


No 


159 


60 



a I = upper class; II = middle class; III = lower class. Social classes were defined by the investigators using local standards. 
b "No" means that the woman has never smoked. 
c "No" means that the woman has never drunk alcohol. 
d Height in cm. 
e Weight in kg. 
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A.3 Absenteeism from work data 

The following data on absenteeism due to acute upper respiratory system diseases among the 
250 employees in a carpentry factory in Ankara, Turkey, relate to all such absences which in- 
cluded one or more days during January 1995. 



Patient 


Absent 


Patient 


Absent 


from 


to 


from 


to 


A 


3 Jan 


9 Jan 


A 


14 Jan 


17 Jan 


A 


27 Jan 


30 Jan 


B 


22 Dec 


11 Jan 


C 


30 Dec 


6 Jan 


D 


3 Jan 


19 Jan 


D 


27 Jan 


5 Feb 


E 


24 Dec 


1 Feb 


F 


24 Jan 


4 Feb 


G 


4 Jan 


7 Feb 


H 


1 1 Jan 


20 Jan 


1 


22 Jan 


30 Jan 


J 


5 Jan 


10 Jan 


J 


18 Jan 


26 Jan 


K 


29 Dec 


9 Jan 


L 


28 Dec 


7 Jan 


L 


12 Jan 


19 Jan 


L 


27 Jan 


31 Jan 


M 


21 Jan 


29 Jan 


N 


22 Dec 


9 Feb 


0 


23 Jan 


5 Feb 


P 


28 Jan 


2 Feb 


R 


3 Jan 


10 Jan 


R 


21 Jan 


26 Jan 


5 


29 Dec 


4 Jan 


S 


9 Jan 


13 Jan 


5 


17 Jan 


18 Jan 


S 


24 Jan 


2 Feb 


T 


22 Jan 


31 Jan 


U 


1 Jan 


10 Jan 


U 


27 Jan 


1 Feb 









0 

ERIC 
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Statistical tables 



Table B. 1 The normal distribution (standardized deviates for two-tailed areas p) 



p 


0.00 


0.01 


0.02 


0.03 


0.04 


0.05 


0.06 


0.07 


0.08 


0.09 


0.0 


00 


2.575829 


2.326348 


2.170090 


2.053749 


1.959964 


1.880794 


1.811911 


1.750686 


1.695398 


0.1 


1.644854 


1.598193 


1.554774 


1.514102 


1.475791 


1.439521 


1.405072 


1.372204 


1.340755 


1.310579 


0.2 


1.281552 


1.253565 


1.226528 


1.200359 


1.174987 


1.150349 


1.126391 


1.103063 


1.080319 


1.058122 


0.3 


1.036433 


1.015222 


0.994458 


0.974114 


0.954165 


0.934589 


0.915365 


0.896473 


0.877896 


0.859617 


0.4 


0.841621 


0.823894 


0.806421 


0.789192 


0.772193 


0.755415 


0.738847 


0.722479 


0.706303 


0.690309 


0.5 


0.674490 


0.658838 


0.643345 


0.628006 


0.612813 


0.597760 


0.582841 


0.568051 


0.553385 


0.538836 


0.6 


0.524401 


0.510073 


0.495850 


0.481727 


0.467699 


0.453762 


0.439913 


0.426148 


0.412463 


0.398855 


0.7 


0.385320 


0.371856 


0.358459 


0.345125 


0.331853 


0.318639 


0.305481 


0.292375 


0.279319 


0.266311 


0.8 


0.253347 


0.240426 


0.227545 


0.214702 


0.201893 


0.189118 


0.176374 


0.163658 


0.150969 


0.138304 


0.9 


0.125661 


0.113039 


0.100434 


0.087845 


0.075270 


0.062707 


0.050154 


0.037608 


0.025069 


0.012533 



Table B.1 is taken from Table I of Fisher & Yates: Statistical tables lor biological, agricultural and medical research, published by Longman Group 
UK Limited, 1974. 



Table B.2 Distribution of t (for two-tailed tests) 



Probability of greater value, p 



df 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


0.05 


0.02 


0.01 


0.001 


1 


0.158 


0.325 


0.510 


0.727 


1.000 


1.376 


1.963 


3.078 


6.314 


12.706 


31.821 


63.657 


636.62 


2 


0.142 


0.289 


0.445 


0.617 


0.816 


1.061 


1.386 


1.886 


2.920 


4.303 


6.965 


9.925 


31.598 


3 


0.137 


0.277 


0.424 


0.584 


0.765 


0.978 


1.250 


1.638 


2.353 


3.182 


4.541 


5.841 


12.941 


4 


0.134 


0.271 


0.414 


0.569 


0.741 


0.941 


1.190 


1.533 


2.132 


2.776 


3.747 


4.604 


8.610 


5 


0.132 


0.267 


0.408 


0.559 


0.727 


0.920 


1.156 


1.476 


2.015 


2.571 


3.365 


4.032 


6.859 


6 


0.131 


0.265 


0.404 


0.553 


0.718 


0.906 


1.134 


1.440 


1.943 


2.447 


3.143 


3.707 


5.959 


7 


0.130 


0.263 


0.402 


0.549 


0.711 


0.896 


1.119 


1.415 


1.895 


2.365 


2.998 


3.499 


5.405 


8 


0.130 


0.262 


0.399 


0.546 


0.706 


0.889 


1.108 


1.397 


1.860 


2.306 


2.896 


3.355 


5.041 


9 


0.129 


0.261 


0.398 


0.543 


0.703 


0.883 


1.100 


1.383 


1.833 


2.262 


2.821 


3.250 


4.781 


10 


0.129 


0.260 


0.397 


0.542 


0.700 


0.879 


1.093 


1.372 


1.812 


2.228 


2.764 


3.169 


4.587 


11 


0.129 


0.260 


0.396 


0.540 


0.697 


0.876 


1.088 


1.363 


1.796 


2.201 


2.718 


3.106 


4.437 


12 


0.128 


0.259 


0.395 


0.539 


0.695 


0.873 


1.083 


1.356 


1.782 


2.179 


2.681 


3.055 


4.318 


13 


0.128 


0.259 


0.394 


0.538 


0.694 


0.870 


1.079 


1.350 


1.771 


2.160 


2.650 


3.012 


4.221 


14 


0.128 


0.258 


0.393 


0.537 


0.692 


0.868 


1.076 


1.345 


1.761 


2.145 


2.624 


2.977 


4.140 


15 


0.128 


0.258 


0.393 


0.536 


0.691 


0.866 


1.074 


1.341 


1.753 


2.131 


2.602 


2.947 


4.073 


16 


0.128 


0.258 


0.392 


0.535 


0.690 


0.865 


1.071 


1.337 


1.746 


2.120 


2.583 


2.921 


4.015 


17 


0.128 


0.257 


0.392 


0.534 


0.689 


0.863 


1.069 


1.333 


1.740 


2.110 


2.567 


2.898 


3.965 


18 


0.127 


0.257 


0.392 


0.534 


0.688 


0.862 


1.067 


1.330 


1.734 


2.101 


2.552 


2.878 


3.922 


19 


0.127 


0.257 


0.391 


0.533 


0.688 


0.861 


1.066 


1.328 


1.729 


2.093 


2.539 


2.861 


3.883 


20 


0.127 


0.257 


0.391 


0.533 


0.687 


0.860 


1.064 


1.325 


1.725 


2.086 


2.528 


2.845 


3.850 
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Table B.2 (continued) 



Probability of greater value, p 



df 


0.9 


0.8 


0.7 


0.6 


0.5 


0.4 


0.3 


0.2 


0.1 


0.05 


0.02 


0.01 


0.001 


21 


0.127 


0.257 


0.391 


0.532 


0.686 


0.859 


1.063 


1.323 


1.721 


2.080 


2.518 


2.831 


3.819 


22 


0.127 


0.256 


0.390 


0.532 


0.686 


0.858 


1.061 


1.321 


1.717 


2.074 


2.508 


2.819 


3.792 


23 


0.127 


0.256 


0.390 


0.532 


0.685 


0.858 


1.060 


1.319 


1.714 


2.069 


2.500 


2.807 


3.767 


24 


0.127 


0.256 


0.390 


0.531 


0.685 


0.857 


1.059 


1.318 


1.711 


2.064 


2.492 


2.797 


3.745 


25 


0.127 


0.256 


0.390 


0.531 


0.684 


0.856 


1.058 


1.316 


1.708 


2.060 


2.485 


2.787 


3.725 


26 


0.127 


0.256 


0.390 


0.531 


0.684 


0.856 


1.058 


1.315 


1.706 


2.056 


2.479 


2.779 


3.707 


27 


0.127 


0.256 


0.389 


0.531 


0.684 


0.855 


1.057 


1.314 


1.703 


2.052 


2.473 


2.771 


3.690 


28 


0.127 


0.256 


0.389 


0.530 


0.683 


0.855 


1.056 


1.313 


1.701 


2.048 


2.467 


2.763 


3.674 


29 


0.127 


0.256 


0.389 


0.530 


0.683 


0.854 


1.055 


1.311 


1.699 


2.045 


2.462 


2.756 


3.659 


30 


0.127 


0.256 


0.389 


0.530 


0.683 


0.854 


1.055 


1.310 


1.697 


2.042 


2.457 


2.750 


3.646 


40 


0.126 


0.255 


0.388 


. 0.529 


0.681 


0.851 


1.050 


1.303 


1.684 


2.021 


2.423 


2.704 


3.551 


60 


0.126 


0.254 


0.387 


0.527 


0.679 


0.848 


1.046 


1.296 


1.671 


2.000 


2.390 


2.660 


3.460 


120 


0.126 


0.254 


0.386 


0.526 


0.677 


0.845 


1.041 


1.289 


1.658 


1.980 


2.358 


2.617 


3.373 


00 


0.126 


0.253 


0.385 


0.524 


0.674 


0.842 


1.036 


1.282 


1.645 


1.960 


2.326 


2.576 


3.291 



Table B.2 is taken from Table III of Fisher & Yates: Statistical tables for biological, agricultural and medical research, published by Longman Group 
UK Limited, 1974. 



Table B.3 Cumulative distribution of/ 2 

Probability of greater value, p 



n 


0.99 


0.98 


0.95 


0.90 


0.80 


0.70 


0.50 


0.30 


0.20 


0.10 


0.05 


0.02 


0.01 


0.001 


1 


0.0 3 157 


0.0 3 628 


0.00393 


0.0158 


0.0642 


0.148 


0.455 


1.074 


1.642 


2.706 


3.841 


5.412 


6.635 


10.827 


2 


0.0201 


0.0404 


0.103 


0.211 


0.446 


0.713 


1.386 


2.408 


3.219 


4.605 


5.991 


7.824 


9.210 


13.815 


3 


0.115 


0.185 


0.352 


0.584 


1.005 


1.424 


2.366 


3.665 


4.642 


6.251 


7.815 


9.837 


11.345 


16.268 


4 


0.297 


0.429 


0.711 


1.064 


1.649 


2.195 


3.357 


4.878 


5.989 


7.779 


9.488 


11.668 


13.277 


18.465 


5 


0.554 


0.752 


1.145 


1.610 


2.343 


3.000 


4.351 


6.064 


7.289 


9.236 


11.070 


13.388 


15.086 


20.517 


6 


0.872 


1.134 


1.635 


2.204 


3.070 


3.828 


5.348 


7.231 


8.558 


10.645 


12.592 


15.033 


16.812 


22.457 


7 


1.239 


1.564 


2.167 


2.833 


3.822 


4.671 


6.346 


8.383 


9.803 


12.017 


14.007 


16.622 


18.475 


24.322 


8 


1.646 


2.032 


2.733 


3.490 


4.594 


5.527 


7.344 


9.524 


11.030 


13.362 


15.507 


18.168 


20.090 


26.125 


9 


2.088 


2.532 


3.325 


4.168 


5.380 


6.393 


8.343 


10.656 


12.242 


14.684 


16.919 


19.679 


21.666 


27.877 


10 


2.558 


3.059 


3.940 


4.865 


6.179 


7.267 


9.342 


11.781 


13.442 


15.987 


18.307 


21.161 


23.209 


29.588 


11 


3.053 


3.609 


4.575 


5.578 


6.989 


8.148 


10.341 


12.899 


14.631 


17.275 


19.675 


22.618 


24.725 


31.264 


12 


3.571 


4.178 


5.226 


6.304 


7.807 


9.034 


11.340 


14.011 


15.812 


18.549 


21.026 


24.054 


26.217 


32.909 


13 


4.107 


4.765 


5.892 


7.042 


8.634 


9.926 


12.340 


15.119 


16.985 


19.812 


22.362 


25.472 


27.688 


34.528 


14 


4.660 


5.368 


6.571 


7.790 


9.467 


10.821 


13.339 


16.222 


18.151 


21.064 


23.685 


26.873 


29.141 


36.123 


15 


5.229 


5.985 


7.261 


8.547 


10.307 


11.721 


14.339 


17.322 


19.311 


22.307 


24.996 


28.259 


30.578 


37.697 


16 


5.812 


6.614 


7.962 


9.312 


11.152 


12.624 


15.338 


18.418 


20.465 


23.542 


26.296 


29.633 


32.000 


39.252 


17 


6.408 


7.255 


8.672 


10.085 


12.002 


13.531 


16.338 


19.511 


21.615 


24.769 


27.587 


30.995 


33.409 


40.790 


18 


7,015 


7.906 


9.390 


10.865 


12.857 


14.440 


17.338 


20.601 


22.760 


25.989 


28.869 


32.346 


34.805 


42.312 


19 


7.633 


8.567 


10.117 


11.651 


13.716 


15.352 


18.338 


21.689 


23.900 


27.204 


30.144 


33.687 


36.191 


43.820 


20 


8.260 


9.237 


10.851 


12.443 


14.578 


16.266 


19.337 


22.775 


25.038 


28.412 


31.410 


35.020 


37.566 


45.315 


21 


8.897 


9.915 


11.591 


13.240 


15.445 


17.182 


20.337 


23.858 


26.171 


29.615 


32.671 


36.343 


38.932 


46.797 


22 


9.542 


10.600 


12.338 


14.041 


16.314 


18.101 


21.337 


24.939 


27.301 


30.813 


33.924 


37.659 


40.289 


48.268 


23 


10.196 


11.293 


13.091 


14.848 


17.187 


19.021 


22.337 


26.018 


28.429 


32.007 


35.172 


38.968 


41.638 


49.728 


24 


10.856 


11.992 


13.848 


15.659 


18.062 


19.943 


23.337 


27.096 


29.553 


33.196 


36.415 


40.270 


42.980 


51.179 


25 


11.524 


12.697 


14.611 


16.473 


18.940 


20.867 


24.377 


28.172 


30.675 


34.382 


37.652 


41.566 


44.314 


52.620 
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Table B.3 (continued) 

Probability of greater value, p 



n 


0.99 


0.98 


0.95 


0.90 


0.80 


0.70 


0.50 


0.30 


0.20 


0.10 


0.05 


0.02 


0.01 


0.001 


26 


12.198 


13.409 


15.379 


17.292 


19.820 


21.792 


25.336 


29.246 


31.759 


35.563 


38.885 


42.856 


45.642 


54.052 


27 


12.879 


14.125 


16.151 


18.114 


20.703 


22.719 


26.336 


30.319 


32.912 


36.741 


40.113 


44.140 


46.963 


55.476 


28 


13.565 


14.847 


16.928 


18.939 


21.588 


23.647 


27.336 


31.391 


34.027 


37.916 


41.337 


45.419 


48.278 


56.893 


29 


14.256 


15.574 


17.708 


19.768 


22.475 


24.577 


28.336 


32.461 


35.139 


39.087 


42.557 


46.693 


49.588 


58.302 


30 


14.953 


16.306 


18.493 


20.599 


23.364 


25.508 


29.336 


33.530 


36.250 


40.256 


43.773 


47.962 


50.892 


59.703 



Table B.3 is taken from Table IV of Fisher & Yates: Statistical tables for biological, agricultural and medical research, published by Longman Group 
UK Limited, 1974. 



For larger values of n, the expression J(2x 2 ) — J(2n— 1) may be used as a normal deviate with 
variance, remembering that the probability for x 2 corresponds to that of a single tail of the nor- 
mal curve. 
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ANNEX C Random numbers 



88008 


13730 


06504 


37113 


62248 


04709 


17481 


77450 


46438 


61538 


01309 


13263 


70850 


11487 


68136 


06265 


36402 


06164 


35106 


77350 


45896 


59490 


98462 


11032 


78613 


78744 


13478 


72648 


98769 


28262 


50107 


24914 


99266 


23640 


76977 


31340 


43878 


23128 


03536 


01590 


71163 


52034 


03287 


86680 


68794 


94323 


95879 


75529 


27370 


68228 


76445 


87636 


23392 


01883 


27880 


09235 


55886 


37532 


46542 


01416 


84130 


99937 


86667 


92780 


69283 


73995 


00941 


65606 


28855 


86125 


00642 


10003 


08917 


74937 


57338 


62498 


08681 


28890 


60738 


81521 


64478 


94624 


82914 


00608 


43587 


95212 


92406 


63366 


06609 


77263 


02379 


83441 


90151 


14081 


28858 


68580 


66009 


17687 


49511 


3721 1 


32525 


44670 


57715 


38888 


28199 


80522 


06532 


48322 


57247 


46333 


01976 


16524 


32784 


48037 


78933 


50031 


64123 


83437 


09474 


73179 


67952 


41501 


45383 


78897 


86627 


07376 


07061 


40959 


84155 


88644 


38473 


83533 


39754 


90640 


98083 


39201 


94259 


87599 


50787 


75352 


91079 


93691 


11606 


49357 


55363 


98324 


30250 


20794 


83946 


08887 


72830 


10186 


08121 


28055 


95788 


03739 


65182 


68713 


63290 


57801 


40947 


75518 


59323 


64104 


24926 


85715 


67332 


49282 


66781 


92989 


44088 


70765 


40826 


74118 


62567 


75996 


68126 


88239 


57143 


06455 


19154 


29851 


16968 


66744 


77786 


82301 


99585 


23995 


15725 


64404 


13206 


90988 


34929 


14992 


07902 


23622 


11858 


84718 


22186 


35386 


24102 


13822 


56106 


13672 


31473 


75329 


45731 


47361 


47713 


99678 


59863 


62284 


24742 


21956 


95299 


24066 


60121 


78636 


61805 


39904 


57389 


70298 


05173 


48492 


68455 


77552 


87048 


16953 


4581 1 


22267 


63741 


76077 


44579 


66289 


88263 


54780 


76661 


90479 


79388 


15317 


17417 


56413 


35733 


27600 


06266 


76218 


42258 


35198 


26953 


08714 


85797 


58089 


91501 


34154 


96277 


83412 


70244 


58791 


64774 


75699 


65145 


97885 


44847 


37158 


54385 


38978 


20127 


40639 


80977 


73093 


24436 


65453 


37073 


81946 


36871 


97212 


59592 


85998 


34897 


97593 


20891 


03289 


98203 


05888 


49306 


88383 


56912 


12792 


04498 


20095 


81253 


41034 


09730 


53271 


92515 


08932 


25983 


69674 


72824 


04456 


64337 


64052 


30113 


05069 


54535 


01881 


16357 


72140 


00903 


45029 


35929 


76261 


43784 


19406 


26714 


96021 


33162 


30303 


81940 


91598 


34525 


54453 


43516 


48537 


60593 


11822 


89695 


80143 


80351 


33822 


27506 


45413 


42176 


94190 


29987 


90828 


72361 


29342 


72406 


44942 


92413 


00212 


35474 


22456 


76958 


85857 


85692 


75341 


32682 


00546 


76304 


57063 


70591 


06343 


38828 


15904 


79837 


46307 


40836 


69182 


17680 


92757 


40299 


98105 


67139 


01436 


68094 


78222 


61283 


40512 


43281 


36931 


26091 


42028 


62718 


38898 


64356 


19740 


77068 


78392 
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30647 


40659 


23679 


04204 


67628 


81109 


73155 


68299 


62768 


58409 


26840 


42152 


80242 


57640 


19189 


47061 


44640 


52069 


98038 


49113 


70356 


18201 


88552 


54591 


68945 


57225 


92109 


07030 


47296 


40164 


28577 


15590 


61477 


96785 


90709 


53143 


01967 


40866 


8681 1 


04804 


38403 


68247 


63353 


92870 


53557 


42535 


06235 


91986 


97934 


09235 


87534 


31527 


72736 


73298 


67797 


89494 


27571 


47587 


53547 


31389 


73830 


65077 


51022 


32879 


11985 


69389 


06764 


84624 


24842 


51545 


24032 


98536 


79706 


15902 


86947 


78664 


57706 


51749 


94860 


33561 


56318 


00120 


85872 


45897 


07733 


15237 


57442 


05430 


31406 


62406 


58389 


25189 


48073 


53316 


84652 


43202 


28630 


32863 


07363 


16011 


46826 


99095 


64962 


18086 


50284 


47728 


67035 


92946 


07467 


55890 


97589 


70925 


77108 


98739 


57058 


81215 


05150 


62879 


44837 


02277 


10890 


70458 


41454 


73113 


62946 


82771 


24072 


91593 


33505 


18089 


55477 


16684 


69066 


72658 


73424 


55250 


01147 


58078 


97168 


69002 


59688 


82108 


69870 


85266 


71787 


07846 


31548 


08558 


01935 


42329 


80744 


09229 


73891 


48306 


63604 


70829 


83549 


60958 


25769 


08967 


86026 


44830 


93996 


63509 


22690 


85741 


43555 


22962 


44941 


42156 


9371 1 


57131 


57271 


54405 


64093 


50501 


88610 


51036 


27254 


26865 


10223 


67197 


79520 


36563 


52148 


39004 


96351 


14319 


59138 


22260 


74059 


51819 


53517 


62234 


38397 


71718 


80076 


48795 


05009 


18003 


11960 


40636 


60755 


75707 


23668 


45086 


53678 


03116 


47910 


77951 


01467 


84719 


96945 


43072 


50023 


11928 


21690 


74722 


62420 


77690 


70918 


56572 


72014 


52221 


00756 


81437 


79282 


09838 


14647 


04536 


36894 


81550 


84614 


83081 


08450 


38782 


22219 


67360 


89328 


20001 


07415 


23581 


78984 


94824 


19906 


70606 


09417 


13999 


55960 


06708 


60021 


33739 


50837 


53540 


77186 


29730 


45408 


47195 


89119 


40244 


41772 


50234 


47352 


32239 


17611 


35145 


80340 


95114 


68463 


89158 


69444 


19478 


95346 


83581 


90109 


00573 


47790 


64065 


60205 


80643 


66970 


27493 


75777 


10117 


63266 


54058 


74717 


02382 


44211 


63006 


73322 


33272 


15183 


27914 


83074 


31286 


64330 


75909 


77787 


56056 


95378 


15283 


62105 


95780 


91088 


59918 


57913 


44220 


63174 


16438 


29647 


85768 


80778 


99379 


51431 


15459 


31573 


52389 


01216 


64665 
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Note: Page numbers in bold type indicate definitions and main discussions. 



Absolute dispersion 54 
Accu racy 

medical records 158-159 
morbidity data 115-116 
Addition rule, probability 63 
Arithmetic mean 43, 44 , 49 
Arithmetic progression 141,143 
Assessment 

EPI immunization coverage 198, 199 , 202 
health programmes x, 198-202 
scientific papers 2 1 1 
Association 91, 92, 98 
Attributes 15 - 16 , 18 
definition 20 
dichotomous 6 1 , 63 
sampling 77-78 

Bar charts 30, 32 , 35 , 42 
example 38 
Bias 183 , 199 

Binomial distribution 58-61, 64-65 
worked example 65 

Births, registration of 105, 108 - 109 , 110 

Case-control studies 199,202 
Case-fatality rate 122, 129 
Categorical data 1 1, 16 - 17 , 18, 43 
presentation 31-32 
Categorical variables 20 , 91 
Cause of death, certification 169 , 174 - 175 , 176 
Censuses 24, 105, 109-110, 141 
definition 112 
organization of 106-107 
Central limit theorem 66, 67, 68 
Central tendency 
definition 49 

measures of 43 - 50 , 52, 53 
Certification, cause of death 169, 174 - 175 , 176 
Chance see Probability 

Chi-squared (% 2 ) distribution, tables 221-222 
Chi-squared (% 2 ) test 81-82, 91, 92 , 95 
example 96 

Child health indicators 153 
Class intervals 33, 46-47 
Classes 

cumulative frequency 32, 33-34, 35 
data 31, 33, 35 
Classification 35 

of diseases (ICD) 169-175 
Classificatory scales see Nominal scales 
Clinical trials 7, 181-184 
controls 178-179, 189 
definition 181 , 189 
design 178, 182 - 183 , 184, 190 
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ethics 203-210 

Helsinki Declaration 209-210 

need for 182, 204 

preliminary planning 203, 205 - 206 , 207 
randomization 178, 189 
Cluster sampling 69, 74 
Coefficient 

of correlation 91, 92 - 93 , 95, 96 
of regression 98 
of variation 51, 52, 53 , 55, 57 
Comparisons, samples 79-80, 87-88 
Compound events 63 
Computers x, 191-197 
central tendency indices 47 
data presentation 33 
health information services 25, 26 
misuse of 193 

software 33, 191, 193 - 194 , 195 
term definitions 196-197 
Conditional probability 59, 63 
Confidence limits 66, 69, 71 
Confidentiality 157, 159 
Contingency tables 81, 91, 92, 98 
Continuous variables 16 , 20 
Contraception see Family planning 
Controls 

clinical trials 178-179, 189 , 190 
ethics 204 
Correlation 91-96 
coefficients 91, 92 - 93 , 95, 96 
examples 99-101 
spurious 98 

Cross-over trials 178, 189 , 190 
Cross-tabulation 30-31, 33, 35 , 42, 92 
example 37 

Crude birth rate 133, 136, 137 
Crude death rate 122, 124 , 129 

Data 11-22 

analysis 5, 11, 23, 181 , 199 
categories 17-18 
grouped 31, 47, 50 
measurement of 11, 14 - 18 , 19 
organization of 30-42 
raw 30, 34 

sources of 11, 12 - 13 , 18, 24-25, 146 
supplementary sets 215-219 
validation 15 , 158-159 
Data collection 4 , 7, 8, 11-13 
health surveys 181 
HIS forms 26, 29 
morbidity and disability 115 
sentinel reporting units 24, 26 
systems of 11, 12 - 13 , 18, 19, 184 
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Data presentation 30-42 
diagrammatic 30, 31 - 32 , 34 
misuse of 32 , 40-41 
illustrative examples 36-41 
labelling 32 
methods of 42 
review 46 

tabular 30, 31 , 32-33, 34, 42 
Death 

certification 169, 174 - 175 , 176 
registration 103, 108 - 109 , 110 
Death rates 122-132 
crude 122, 124 , 129 
standardized 123, 125-126 
worked examples 130-132 
Decision-making 4, 5, 8, 24 , 25 
Degrees of freedom 92, 95 
Delphi techniques 199, 200 , 202 
Demography 7, 105-113 
see also Censuses; Population dynamics; 
Populations 

Dependent variables 98 
Descriptive statistics 4-5, 63 
Design 

clinical trials 178, 182 - 183 , 184, 190 
data collection instruments 13 
health surveys 8, 178, 179 - 181 , 184 
medical research 7, 203, 205 - 206 , 207, 

209 

questionnaires 181 
Diagnoses 3, 4, 6, 8, 159, 164 
Diagrams 30, 31 - 32 , 34 
misuse of 32 , 40-41 
Dichotomy 60-61, 63 
Disability 1 14-1 17, 120 , 153 
Discrete variables 16 , 20 
Disease 
categories 6 

surveillance 12, 18,23-29, 116 
Dispersion see Absolute dispersion; Variability 
Distributions 

binomial 58, 59, 60, 61, 64-65 
frequency 31, 33, 35 , 36, 46 
multimodal 49 , 53 
normal 53, 54, 57 , 59 
patterns 33 

sampling 61, 67, 68, 71 , 80 

skewed 46, 53 

tables 

chi-squared (x 2 ) 221-222 
normal 220 

t (two-tailed tests) 220-221 
Double-blind trials 178, 189 , 204 

EPI see Expanded Programme on Immunization 
Epidemiological information processing package 
(Epi Info) 33, 191, 193 - 194 , 195 
Epidemiology x, 4, 8, 15 
Errors 4, 1 1, 18 
observer 14 

sampling 67-68, 71 , 79-80, 180 
standard 52, 57 , 67, 72 , 80 
types 1 and 2 79, 81, 86 
Estimation 67-68, 69 
precision 71 
validity 72 

Ethics 179 , 183 - 184 , 203-210 

Evaluation, scientific literature 8, 211-214 



Examples 

arithmetic mean 44 
binomial distribution 64, 65 
chi-squared (% 2 ) test 96 
correlations 99-101 
data presentation 36-41 
disease incidence 118 
disease prevalence 1 17 
frequency polygons 38, 41 
frequency tables 36-37 
health-for-all indicators 151-152 
histograms 38 
median 44-45 
mode 45 

morbidity indices 1 1 7-1 1 8 
percentiles 45 
pie charts 39 

population measurement 139-140, 144 

probability 63-65 

proportion 117 

quartiles 45 

rate/ratio 117 

sample size determination 76-78 
standardized death rates 130-132 
Mest 89-90 
weighted mean 45-46 
Z-test 87-88 

Expanded Programme on Immunization (EPI), 
coverage assessment 198,199,202 
Extrapolation 98 

Family planning 133, 135, 153 
Feedback 28 

Fertility measurement 133-137 
data sources 133, 134 
indices 134 - 135 , 136, 137 , 141, 153 
Fetal death 113 , 122, 166 
Fisher's Exact Probability Test 81 
Focus-group discussions 198, 199 - 200 , 202 
Frequency 
class 35 

distributions 31, 33, 35, 36, 46 
histograms 32 , 34 

polygons (line charts) 30, 32 , 34, 35 , 42 
examples 38, 41 
tables 30, 31, 33-34, 35 , 42 
examples 36-37 

Gaussian distribution see Normal distribution 
Geographical information systems 199, 200 , 
202 

Geometric mean 43, 49 
Geometric progression 141,143 
Goodness-of-fit 61 
Gross death rate 165 
Gross reproduction rate 137 

Handicap, definition 120 
Health 

administration 7, 8 
definition 120 

management 23, 24, 28 , 159-160 
measurement 114-115, 118 
monitoring 145, 146, 147, 151-153 
policy 28 

Health-for-all indicators 151-152 
Health care delivery 3, 5 - 7 , 23 
Health data see Data 




• 



234 



INDEX 



227 



Health facility statistics 157-168 
mortality rates 160, 163 , 165-166 
use indices 157-158, 161-162, 163 - 164 , 168 
Health indicators x, 145-153 
data sources 146 
definitions 149 
health-for-all 151-152 
need for 145 
types of 146, 151-153 
Health information systems (HIS) x, 23-29 
characteristics 25 
data collection 13 
definition 28 
forms 26, 29 
personnel 24, 26 
subsystems 24 

Health surveys 178, 179 - 181 , 184 
definition 179 
exercises 185 
interviewer training 181 
planning 178, 179 - 181 , 184 
reference population 180 
Helsinki Declaration, World Medical Association 
206, 208-210 
Heterogeneity 52, 54 
HIS see Health information systems 
Histograms 30, 32 , 34, 35 , 42 
example 38 
Homogeneity 52, 54 
Hypothesis testing 79 - 90 , 95 

ICD see International Classification of Diseases 
ICPM see International Classification of 
Procedures in Medicine 
Identification data, medical records 158 , 163 
Immu nization 
coverage indicators 153 
EPI modified cluster survey 198, 199, 202 
see also Vaccination 
Impairment, definition 120 
Incidence, disease 114, 118, 120 
Independent events 58, 59, 60, 63 
Independent variables 98 
Indicators see Health indicators 
Indices 

central tendency 43, 46-47 
health facility use 1 57-1 58, 1 6 1-162, 163 - 
164 , 168 

morbidity 117-1 18 
mortality 1 22-1 26 
population growth 139-140 
quality of care 160 - 162 , 165-167 
summary 43, 46, 49 
variability 51, 54 

Infant mortality rate 122, 129 , 151, 153 , 165 
Infection rates 166-167 
Information 
importance of 7 

systems x, 23 - 29 , 199, 200 , 202 
Informed consent 189 , 204, 209, 210 
Inter-quartile range 51 

Inter-Regional Conference on Teaching Statistics 
to Medical Undergraduates, Karachi 
( 1 978) ix, xi 

Interim assessment, rapid methods x, 198 - 
202 

International Classification of Diseases (ICD) 

23, 169, 170 - 175 , 177 



International Classification of Procedures in 
Medicine (ICPM) 163 
International Health Regulations 12, 21-22 
International vaccination requirements 21-22 
Interpolation 98 
Interval estimates 66, 71 
Interval scales 11,17 

Journals, medical 8 

Karachi, Inter-Regional Conference on Teaching 
Statistics to Medical Undergraduates 
(1978) ix, xi 

Labelling, data presentation 32 
Levels of significance 81-82, 86 
Life expectancy 125 , 143 , 151 
Line charts see Frequency polygons 
Linear regression 91 , 93-94 
misuse of 94 

Linear relationships 91-95, 98 
non-linear distinction 91,95 
Live birth, definition 113 
Location, measures of 43-50 

Maps 32 

Maternal mortality rate 122, 129 
Mean(s) 53 

arithmetic 43, 44 , 49 
comparison of two 79, 80 
computation of 50 
geometric 43 , 49 
outliers to 46 

standard error of 57 , 66, 69 
weighted 43, 45 - 46 , 49 
Median 43-44, 44 - 45 , 46, 49 
computation of 50 
Medical records 157-168 
confidentiality 157, 159 
data elements 159 - 160 , 163 
data validation 158-159 
definitions 163-164 
identification data 158 , 163 
limitations 157, 159 
Medical research 3, 4, 7 
ethics 203-210 
outcomes 63 

see also Clinical trials; Health surveys 
Medical significance, statistical significance 
distinction 79, 80, 82 
Migration 141, 143 
Mode 43-44, 45 , 46, 49 
Modified cluster survey (EPI immunization 
coverage assessment) 198, 199,202 
Monitoring see Surveillance 
Morbidity 114 - 121 , 159, 166-167 
data accuracy 1 14, 115-117 
data sources 114, 115 
definition 120 
health facilities 166-167 
ICD data 171-173, 175-176 
indices 1 17-1 18 
Mortality 105, 122-132 
data limitations 122, 123 
data sources 122, 123 - 124 , 125, 126 
definitions 124, 129 
health facilities 160, 163 , 165-166 
ICD data 171, 174-176 
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Mortality (continued) 
indicators 153 
indices 122 

Mother and child health indicators 153 
Multi-factorial relationships 98 
Multimodal distributions 49 , 53 
Multiplication rule, probability 63-64 
Multistage sampling 75 
Mutually exclusive events 58, 59, 60, 63 

Neonatal mortality rate 122, 129 
Net death rate 165 
Nominal scales 11, 16 , 17 
Non-linear relationships 98 
linear distinction 91,95 
Non-parametric tests, parametric distinction 82 
Normal distribution 53, 54, 57 , 59 
tables 220 

Normal values 8, 51-52, 53 , 54, 57 
definition 57 

Null hypothesis 80, 81, 86, 92 

Observer error 14 
Odds, probability 63 
Ogives 30, 32 , 34, 35 
One- and two-tailed tests 81, 86 
Ordered array 30, 33, 35 , 42 
Ordinal scales 11, 16-17 
Organization of data 30-42 
Outliers 46, 53 

^-values 80, 86 
Parameters 
population 71 
statistics distinction 66, 67 
Parametric tests, non-parametric distinction 
82 

Patients, misuse of 203 , 207 
Pearson's correlation coefficient (r) 92 - 93 , 95, 
96 

misuse of 93 

Percentages, computation 43 
Percentiles 43-44, 45 , 46-47, 49 
Perinatal mortality rate/ratio 122, 129 
Pie charts 30, 31 , 35 , 42 
example 39 
Placebos 189 , 205 
Plan ning 

health surveys 178, 179 - 181 , 184 
public health 3, 7 , 8, 23 
Point estimates 66, 71 
Population dynamics 7 , 138-144 
changes over time 139 
definition 138 , 143 

demographic transition 138, 140 , 141, 143 
indices 139-140 
projection 140 , 141-142 
example 144 
Populations 
at risk 1 14, 120 

census 105, 106 - 107 , 109-110, 112 

definition 71 

pyramids 139 , 141, 143 

sampling 13, 66 - 78 , 180 

size 105, 106, 112 

surveys 4, 180 

Post-neonatal mortality rate 122, 129 
Precision, of an estimate 71 



Prediction, outcomes 3, 5, 6 
Predictive value 15 
Prevalence, disease 
definition 1 14, 120 
examples 117 
Probability 5, 58-65 

binomial distribution 58, 59, 60-61, 64-65 
worked example 65 
conditional 59, 63 
definition 63 
laws/rules 58, 59, 63-64 
normal distribution 58, 59, 60 
sampling 61, 66, 73-75 
Programmes, rapid interim assessment x, 198 - 
202 

Prophylactic trials 178,182 
Proportion 44, 80 
definition 1 14, 120 
example 117 

Qualitative data 16 
presentation 31-32 
see also Categorical data 
Quantitative data 11 , 15-19 
presentation 32 
sampling 77 

Quartiles 43-44, 45 , 46, 49 
Questionnaires 

computer-based 191, 194, 195 
health surveys 178, 179, 180 , 181 

Random error 1 1, 70 
Random numbers, table 223-224 
Random sampling 66, 68, 69, 73-74 
Random variation 54 
Randomized controlled trials 178, 189 
ethics 204 
Range 51, 52 , 53, 57 
Ranking scales see Ordinal scales 
Rate 

definition 114, 120 
example 117 
Ratio 

definition 114, 120 
example 117 
Ratio scales 11,17 
Raw data 30, 34 

Reduction, health data 30, 31, 34 
Registration, vital see Vital registration 
Regression 91, 98 
coefficients 98 
linear 91, 93 - 94 , 95-96 
Regular systems, data collection 11, 12 - 13 , 18, 
19 

Relative frequencies 33, 35 
Relevance, health indicators 149 
Reliability 

census data 105, 107 
data measurement 11, 14 , 18, 20 
health indicators 149 
variation effect 55 
vital registration 108 
Reporting, HIS 26, 29 

Research see Clinical trials; Medical research 
Reticulation, definition 112 
Risk 5, 203-204 

Routine systems, data collection 11 , 12 - 13 , 18, 
19 
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Sample 

definition 71 

dependent/independent 79, 80 
Sample size 13, 67, 68, 69 
determination 76-78 
health surveys 180-181 
statistical significance 80 
Sampling 13 , 44 , 66-78 
health surveys 180-181 
methods 67, 69, 70, 73-75 
multistage 75 

random/non-random 66, 68, 69, 73-74 
systematic 74 
universe 72 

Sampling distribution 61, 67, 68, 71 , 80 
Sampling error 67-68, 71 , 79-80, 180 
Sampling fraction 71 
Sampling unit 72 
Sanitation indicators 153 
Scales, data measurement 1 1, 16 - 18 , 19, 63 
Scatter diagrams 91, 95, 96 
Scientific papers, evaluation 211-214 
Seminars 
ethics 203-210 
scientific papers 211-214 
Sensitivity 

data measurement 15 , 19, 20 
health indicators 149 
Sentinel reporting units 24, 26 
Significance tests 79 - 90 , 92 
definition 86 
selection 82 

significance levels 81-82, 86 
Simple events 63 
Single-blind trials 178, 189 
Skewed distributions 46, 53 
Software, Epi Info 33, 191, 193 - 194 , 195 
Sources 

fertility data 1 33, 134 
health data 1 1, 12 - 13 , 18,24-25, 146 
morbidity/disability data 114, 115 
mortality data 122, 123 - 124 , 125, 126 
of uncertainty 3, 10 
of variation 1 0, 5 1 
Specificity 

data measurement 15 , 19, 20 
health indicators 149 
Standard deviation 51, 52 , 53-55, 57 
Standard error 52 , 57 , 66-69, 72 , 80 
Standardization, medical data/records 158 
Standardized death rates 123, 125-126 
worked examples 130-132 
Statistical inference 79 
Statistical relationship, causal relationship 
distinction 91 

Statistical significance see Significance tests 
Statistical tables 220-224 
Statistics 
definition 4 
misuse of 205 - 206 , 207 
role of 3-10 
Stillbirth 122, 129 
definition 113 

Stratified random sampling 74 
Summary indices 43, 46, 49 
variability 5 1, 54 

Surveillance systems 12, 18,23-29, 1 16, 153 
see also Data collection; Health indicators 



Surveys 12 
design 8 

modified cluster 198, 199 , 202 
see also Health surveys 
Survival, measurement 140 , 143 
Systematic sampling 74 
Systematic variation 54 
Systems 

health information 23 - 29 , 199, 200 , 202 
vital registration 108 - 109 , 123-124 

t distribution (two-tailed tests), tables 220- 
221 

Mest 80, 81, 82, 95 
worked example 89-90 
Tables 

chi-squared (X 2 ) distribution 221-222 
frequency 36-37 
labelling 32 

normal distribution 220 
random numbers 223-224 
t distribution (two-tailed tests) 220-221 
Tabulation, data 30, 31 , 32-33, 34, 42 
Tests 

chi-squared (X 2 ) 81-82, 91, 92 , 95-96 
of significance see Significance tests 
Mest 80, 81, 82, 89 - 90 , 95 
two-tailed 81,86, 220-221 
validation 15 
Z-test 80, 81, 82, 87-88 
Treatment 3, 4, 8 
Trials 178, 182 , 189 , 190 , 204 
probability 63 
see also Clinical trials 
Two-tailed tests 81, 86, 220-221 
Type 1 and 2 errors 79, 81, 86 

Uncertainty 

management of 3, 4, 5, 8 
sources of 3, 10 
see also Probability 
Unit of inquiry 72 
Universe 
sampling 72 
see also Populations 

Vaccination, international requirements 21- 

22 

Validation 

medical records 158-159 
test 15 
Validity 

data measurement 1 1, 14 - 15 , 18, 20 
estimates 72 
health indicators 150 
variation effect 54 

Values, normal 43, 51-52, 53 , 54, 57 , 58 
Variability 51-57 
Variables 15 - 16 , 18 
continuous 16 , 20 
definition 20 

dependent/independent 98 
discrete 16, 20 
distribution patterns 33 
Variance 51, 52, 55, 57 
Variation 4 

coefficient of 51-52, 53 , 55, 57 
handling of 3, 5-6, 8 
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Variation (continued) 
sources of 10, 5 1 
systematic 54 

Vital registration 12-13, 18, 24, 105 - 113 , 122 
definition 28 
health workers 108, 110 
systems 108 - 109 , 123-124 

Water/sanitation indicators 153 

Weighted mean 43, 45 - 46 , 49 



World Medical Association Declaration of 
Helsinki 206, 208-210 
World Summit for Children, monitoring 
indicators 153 

Yates' correction for continuity 81 

z-test 80, 81, 82 

worked example 87-88 
Zero population growth 143 




238 



Selected WHO publications 
of related interest 



Basic epidemiology. 

Beaglehole R, Bonita R, Kjellstrom T. 
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Sw. fr. 18.- 
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Volume 2: instruction manual. 1993 (167 pages) Sw. fr. 40- 
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This book is a new edition of Teaching health statistics: twenty lesson and seminar 
outlines, first published by WHO in 1986. The topics covered form an internationally 
applicable basic curriculum for teaching health statistics to trainee health 
workers, which can be adapted to meet the needs of different groups of students. 

While based on those of the first edition, the lesson and seminar outlines have 
been revised and updated in both content and orientation. They offer a practical 
aid for teaching all types of future health workers, not medical students alone, 
enabling them to acquire competence in the application of statistical principles 
and methods. The book covers not only the conventional topics of data collection, 
presentation and analysis, probability and vital statistics, but also such subjects 
as health indicators, use of computers and rapid methods of interim assessment. 
Each outline provides a clear statement of objectives and suggestions for lesson 
content and structure, examples, exercises and handouts. 

The outlines are designed for selective use by teachers of statistics in preparing 
lessons and seminars and in deciding on course content. They are not intended 
to be used by students for self-instruction. 




Price: Sw. fr. 72.- 

Price in developing countries: Sw. fr. 50.40 
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